Lecture 9 powerpoint

advertisement
Linear regression
By gradient descent
(with thanks to Prof. Ng’s
machine learning course)
Extending the single variable
multivariate linear regression
hΘ(x) = Θ0 + Θ1x
hΘ(x) = Θ0 + Θ1x1 + Θ2x2 + Θ3x3 + … Θnxn
e.g. start with house prices versus sq ft and then move to house prices
versus sq ft, number of bedrooms, age of house
hΘ(x) = Θ0x0 + Θ1x1 + Θ2x2 + Θ3x3 + … Θnxn
With x0 = 1
hΘ(x) = ΘTx
Cost function
J(Θ) = (1/2m)Σ i=1,m (hΘ(x(i)) – y(i))2
Gradient descent:
Repeat {
Θj = Θj - α ∂J(Θ)/∂Θj
} for all j simultaneously
Θj = Θj - (α /m)Σ i=1,m (hΘ(x(i)) – y(i))
Θ0 = Θ0 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x0(i) 1
Θ1 = Θ1 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x1(i)
Θ2 = Θ2 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x2(i)
What the Equations Mean
The matrices:
y
PRICE
2050
2150
2150
1999
1900
1800
and
x
1
1
1
1
1
1
SQFT
2650
2664
2921
2580
2580
2774
AGE
13
6
3
4
4
2
FEATS
7
5
6
4
4
4
Feature Scaling
Would like all features to fall roughly into range -1 ≤ x ≤ +1
xi replace with (xi - µi )/si where µi is the mean and si is the range;
alternatively, use mean and standard deviation
Don’t scale x0
Converting results back
Learning Rate and Debugging
With small enough α, J should decrease on each iteration: this is first test. An α too
large could have you going past the minimum and climbing other side of curve.
With α too small, convergence is too slow.
Try series of α values, say .oo1, .003,. 01, .03, .1, .3, 1, …
Matlab Implementation
Feature Normalization
function [X_norm, mu, sigma] = featureNormalize(X)
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
mu = mean(X);
sigma = std(X);
m = size(X,1);
A = repmat(mu,m,1);
X_norm = X_norm - A;
A = repmat(sigma,m,1);
X_norm =X_norm./A;
end
Gradient Descent
function [theta, J_history]
= gradientDescentMulti(X, y, theta, alpha, num_iters)
m = length(y);
% number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
A = (X*theta - y);
deltatheta = (alpha/m)*(A'*X);
theta = theta - deltatheta';
J_history(iter) = computeCostMulti(X, y, theta);
end
end
Cost Function
function J = computeCostMulti(X, y, theta)
m = length(y);
% number of training examples
A = (X*theta - y);
J = (1/(2*m))*(A'*A);
end
Polynomials
hΘ(x) = Θ0 + Θ1x + Θ2x2 + Θ3x3
Replace x with x1, x2 with x2, x3 with x3
Scale the x, x2 , x3 values
Normal Equations
Θ = (A’ A)-1 A’y
A(:,n+1) = ones(length(x),1,class(x));
for a polynomial:
for j = n:-1:1
A(:,j) = x.*A(:,j+1);
end
W = A'*A
Y = A'*y
Θ = W\Y
Download