Introduction to Programming - Villanova Department of Computing

advertisement
5: Multivariate Regression
CSC 4510 – Machine Learning
Dr. Mary-Angela Papalaskari
Department of Computing Sciences
Villanova University
Course website: www.csc.villanova.edu/~map/4510/
The slides in this presentation are adapted from:
•
Andrew Ng’s ML course http://www.ml-class.org/
CSC 4510 - M.A. Papalaskari - Villanova
University
1
•
•
•
•
•
Regression topics so far
Introduction to linear regression
Intuition – least squares approximation
Intuition – gradient descent algorithm
Hands on: Simple example using excel
How to apply gradient descent to minimize the cost
function for regression
• linear algebra refresher
CSC 4510 - M.A. Papalaskari - Villanova
University
2
What’s next?
• Multivariate regression
• Gradient descent revisited
– Feature scaling and normalization
– Selecting a good value for α
• Non-linear regression
• Solving for analytically (Normal Equation)
• Using Octave to solve regression problems
CSC 4510 - M.A. Papalaskari - Villanova
University
3
Multiple features (variables).
Size (feet2)
Number of
bedrooms
Number of
floors
2104
1416
1534
852
…
5
3
3
2
…
1
2
2
1
…
Age of home Price ($1000)
(years)
CSC 4510 - M.A. Papalaskari - Villanova
University
45
40
30
36
…
460
232
315
178
…
5
Andrew Ng
Multiple features (variables).
Size (feet2)
Number of
bedrooms
Number of
floors
Age of home
(years)
Price ($1000)
2104
1416
1534
852
…
5
3
3
2
…
1
2
2
1
…
45
40
30
36
…
460
232
315
178
…
Notation:
= number of features
= input (features) of
= value of feature in
training example.
training example.
CSC 4510 - M.A. Papalaskari - Villanova
University
6
Andrew Ng
Multiple features (variables).
Size (feet2)
Price ($1000)
2104
1416
1534
852
…
460
232
315
178
…
CSC 4510 - M.A. Papalaskari - Villanova
University
7
Andrew Ng
Hypothesis:
Previously:
Now:
For convenience of notation, define
.
Multivariate linear regression
CSC 4510 - M.A. Papalaskari - Villanova
University
8
Hypothesis:
Parameters:
Cost function:
Gradient descent:
Repeat
(simultaneously update for every
CSC 4510 - M.A. Papalaskari - Villanova
University
)
9
Gradient Descent
Previously (n=1):
Repeat
(simultaneously update
)
CSC 4510 - M.A. Papalaskari - Villanova
University
10
New algorithm
Repeat
Gradient Descent
:
Previously (n=1):
Repeat
(simultaneously update
)
(simultaneously update
for
)
CSC 4510 - M.A. Papalaskari - Villanova
University
11
New algorithm
Repeat
Gradient Descent
:
Previously (n=1):
Repeat
(simultaneously update
)
(simultaneously update
for
)
CSC 4510 - M.A. Papalaskari - Villanova
University
12
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g.
= size (0-2000 feet2)
= number of bedrooms (1-5)
size (feet2)
Get every feature into range
number of bedrooms
CSC 4510 - M.A. Papalaskari - Villanova
University
13
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g.
= size (0-2000 feet2)
= number of bedrooms (1-5)
Mean normalization
Replace with
to make features
have approximately zero mean
(Do not apply to
).
E.g.
CSC 4510 - M.A. Papalaskari - Villanova
University
14
Gradient descent
- “Debugging”: How to make sure gradient
descent is working correctly.
- How to choose learning rate
CSC 4510 - M.A. Papalaskari - Villanova
University
.
15
Making sure gradient descent is working correctly.
-
0
For sufficiently small ,
should decrease on every iteration.
But if is too small, gradient descent can be slow to converge.
100
200
300
400
Declare convergence if
decreases by less than
in one iteration?
No. of iterations
CSC 4510 - M.A. Papalaskari - Villanova
University
16
Summary: Choosing
- If is too small: slow convergence.
- If is too large:
may not decrease on
every iteration; may not converge.
To choose , try
CSC 4510 - M.A. Papalaskari - Villanova
University
17
Housing prices prediction
CSC 4510 - M.A. Papalaskari - Villanova
University
18
Andrew Ng
Polynomial regression
Price
(y)
Size (x)
CSC 4510 - M.A. Papalaskari - Villanova
University
19
Andrew Ng
Choice of features
Price
(y)
Size (x)
CSC 4510 - M.A. Papalaskari - Villanova
University
20
Andrew Ng
Gradient Descent
Normal equation: Method to solve for
analytically.
CSC 4510 - M.A. Papalaskari - Villanova
University
21
Andrew Ng
Intuition: If 1D
(for every )
Solve for
CSC 4510 - M.A. Papalaskari - Villanova
University
22
Andrew Ng
Examples:
1
1
1
1
Size (feet2)
Number of
bedrooms
Number of
floors
Age of home
(years)
Price ($1000)
2104
1416
1534
852
5
3
3
2
1
2
2
1
45
40
30
36
460
232
315
178
CSC 4510 - M.A. Papalaskari - Villanova
University
23
Andrew Ng
examples
;
features.
E.g. If
CSC 4510 - M.A. Papalaskari - Villanova
University
25
Andrew Ng
is inverse of matrix
.
Octave: pinv(X’*X)*X’*y
CSC 4510 - M.A. Papalaskari - Villanova
University
26
Andrew Ng
training examples,
Gradient Descent
features.
• Need to choose .
• Needs many iterations.
• Works well even
when is large.
Normal Equation
• No need to choose .
• Don’t need to iterate.
• Need to compute
• Slow if
CSC 4510 - M.A. Papalaskari - Villanova
University
is very large.
27
Andrew Ng
Notes on Supervised learning and Regression
http://see.stanford.edu/materials/aimlcs229/cs229-notes1.pdf
Octave
http://www.gnu.org/software/octave/
Wiki: http://www.octave.org/wiki/index.php?title=Main_Page
documentation:
http://www.gnu.org/software/octave/doc/interpreter/
CSC 4510 - M.A. Papalaskari - Villanova
University
28
Exercise For next class:
1. Download and install Octave (Alternative: if you have MATLAB, you can use it instead.)
2. Verify that it is working by typing in an Octave command window:
x = [0 1 2 3]
y = [0 2 4 6]
plot(x,y)
This example defines two vectors, x y and should display a plot showing a straight line (the line
y=2x). If you get an error at this point, it may be that gnuplot is not installed or cannot access
your display. If you are unable to get this to work, you can still do the rest of this exercise,
because it does not involve any plotting (just restart Octave). You might refer to the Octave wiki
for installation help but if you are stuck, you can get some help troubleshooting this on Friday
afternoon 3-4pm in the software engineering lab (mendel 159).
3. Create a few matrices and vectors, eg:
A = [1 2; 3 4; 5 6]
V = [3 5 -1 0 7]
4. Try some of the elementary matrix and vector operations from our linear algebra slides (adding,
multiplying between matrices, vectors and scalars)
5. Print out a log of your session
CSC 4510 - M.A. Papalaskari - Villanova
University
29
Download