Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course) Extending the single variable multivariate linear regression hΘ(x) = Θ0 + Θ1x hΘ(x) = Θ0 + Θ1x1 + Θ2x2 + Θ3x3 + … Θnxn e.g. start with house prices versus sq ft and then move to house prices versus sq ft, number of bedrooms, age of house hΘ(x) = Θ0x0 + Θ1x1 + Θ2x2 + Θ3x3 + … Θnxn With x0 = 1 hΘ(x) = ΘTx Cost function J(Θ) = (1/2m)Σ i=1,m (hΘ(x(i)) – y(i))2 Gradient descent: Repeat { Θj = Θj - α ∂J(Θ)/∂Θj } for all j simultaneously Θj = Θj - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) Θ0 = Θ0 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x0(i) 1 Θ1 = Θ1 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x1(i) Θ2 = Θ2 - (α /m)Σ i=1,m (hΘ(x(i)) – y(i)) x2(i) What the Equations Mean The matrices: y PRICE 2050 2150 2150 1999 1900 1800 and x 1 1 1 1 1 1 SQFT 2650 2664 2921 2580 2580 2774 AGE 13 6 3 4 4 2 FEATS 7 5 6 4 4 4 Feature Scaling Would like all features to fall roughly into range -1 ≤ x ≤ +1 xi replace with (xi - µi )/si where µi is the mean and si is the range; alternatively, use mean and standard deviation Don’t scale x0 Converting results back Learning Rate and Debugging With small enough α, J should decrease on each iteration: this is first test. An α too large could have you going past the minimum and climbing other side of curve. With α too small, convergence is too slow. Try series of α values, say .oo1, .003,. 01, .03, .1, .3, 1, … Matlab Implementation Feature Normalization function [X_norm, mu, sigma] = featureNormalize(X) X_norm = X; mu = zeros(1, size(X, 2)); sigma = zeros(1, size(X, 2)); mu = mean(X); sigma = std(X); m = size(X,1); A = repmat(mu,m,1); X_norm = X_norm - A; A = repmat(sigma,m,1); X_norm =X_norm./A; end Gradient Descent function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters) m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters A = (X*theta - y); deltatheta = (alpha/m)*(A'*X); theta = theta - deltatheta'; J_history(iter) = computeCostMulti(X, y, theta); end end Cost Function function J = computeCostMulti(X, y, theta) m = length(y); % number of training examples A = (X*theta - y); J = (1/(2*m))*(A'*A); end Polynomials hΘ(x) = Θ0 + Θ1x + Θ2x2 + Θ3x3 Replace x with x1, x2 with x2, x3 with x3 Scale the x, x2 , x3 values Normal Equations Θ = (A’ A)-1 A’y A(:,n+1) = ones(length(x),1,class(x)); for a polynomial: for j = n:-1:1 A(:,j) = x.*A(:,j+1); end W = A'*A Y = A'*y Θ = W\Y