Uploaded by Karthik Ch

Activity-lr-1-255

advertisement
3/23/23, 8:47 PM
Activity-lr-1
Linear Regression Example
This example uses the truck sales dataset to illustrate ordinary least-squares (OLS), or
linear regression. The plot shows the line that linear regression learns, which best minimizes
the residual sum of squares between the observed responses in the dataset, and the
responses predicted by the linear approximation. We also compute the residual sum of
squares and the variance score for the model.
In [1]: %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import linear_model
# Get data
df = pd.read_csv(
filepath_or_buffer='trucks.csv',
header=None)
data = df.iloc[:,:].values
X = data[:,0].reshape(-1, 1)
Y = data[:,1].reshape(-1, 1)
# Train the model using the training sets
regr = linear_model.LinearRegression()
regr.fit(X, Y)
slope = regr.coef_[0][0]
intercept = regr.intercept_
print("y = %f + %fx" %(intercept, slope))
print("Mean squared error: %f"
% np.mean((regr.predict(X) - Y) ** 2))
# Explained variance score: 1 is perfect prediction
print('Variance score: %f' % regr.score(X, Y))
# Plot outputs
plt.scatter(X, Y, color='black')
plt.plot(X, regr.predict(X), color='red',
linewidth=1)
plt.show()
y = 0.434585 + 0.851144x
Mean squared error: 0.011812
Variance score: 0.997083
localhost:8888/nbconvert/html/Activity-lr-1.ipynb?download=false
1/7
3/23/23, 8:47 PM
Activity-lr-1
In the cell below, we load a subset of the Iris dataset from UCI, specifically all the samples
for the "Iris Setosa" flower. The function model finds the OLS model for a pair of features
in the data and computes the residual sum of squares and the variance score for that model.
The parameters v1 and v2 are the names of the X and Y variables.
In [2]: import numpy as np
import pandas as pd
from sklearn import linear_model
df = pd.read_csv(
filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databas
header=None)
data = df.iloc[:,:].values
data = data[data[:,4] == "Iris-setosa"][:,:4]
def model(X, Y, v1="A", v2="B"):
X = X.reshape(-1, 1)
Y = Y.reshape(-1, 1)
regr = linear_model.LinearRegression()
regr.fit(X, Y)
slope = regr.coef_[0][0]
intercept = regr.intercept_[0]
print("%s = %f + %fx%s" %(v2, intercept, slope, v1))
sse = np.sum((regr.predict(X) - Y) ** 2)
print("Sum of squared errors: %f" % sse)
# Explained variance score: 1 is perfect prediction
print('Variance score: %f' % regr.score(X, Y))
return slope, intercept, sse, v1, v2
Exercise
localhost:8888/nbconvert/html/Activity-lr-1.ipynb?download=false
2/7
3/23/23, 8:47 PM
Activity-lr-1
The samples have 4 features. For each combination of features (each pair or features),
consider one of the variables as predictor and the other as response and use the function
model to find the OLS model that best fits the data. Report the model with the smallest
SSE score.
In [3]: one = data[:,0]
two = data[:,1]
three = data[:,2]
four = data[:,3]
print(model(one,two))
print(model(one,three))
print(model(one,four))
print(model(two,three))
print(model(two,four))
print(model(three,four))
print(model(two,one))
print(model(three,one))
print(model(four,one))
print(model(three,two))
print(model(four,two))
print(model(four, three))
localhost:8888/nbconvert/html/Activity-lr-1.ipynb?download=false
3/7
3/23/23, 8:47 PM
Activity-lr-1
B = -0.623012 + 0.807234xA
Sum of squared errors: 3.146569
Variance score: 0.557681
(0.807233665122696, -0.623011727604216, 3.146569429387999, 'A', 'B')
B = 0.813768 + 0.129891xA
Sum of squared errors: 1.372483
Variance score: 0.069630
(0.12989060806149594, 0.8137676160441513, 1.3724825071449684, 'A', 'B')
B = -0.180937 + 0.084886xA
Sum of squared errors: 0.519331
Variance score: 0.077892
(0.08488551624453856, -0.18093689432016008, 0.5193311652048225, 'A', 'B')
B = 1.188976 + 0.080463xA
Sum of squared errors: 1.429143
Variance score: 0.031221
(0.08046332480530793, 1.1889763558154574, 1.4291427928814417, 'A', 'B')
B = -0.025258 + 0.078776xA
Sum of squared errors: 0.519054
Variance score: 0.078385
(0.07877646265006044, -0.025257949337906593, 0.5190536703309061, 'A', 'B')
B = -0.033080 + 0.189262xA
Sum of squared errors: 0.510358
Variance score: 0.093825
(0.18926247288503256, -0.03308026030368766, 0.5103579175704992, 'A', 'B')
B = 2.644660 + 0.690854xA
Sum of squared errors: 2.692927
Variance score: 0.557681
(0.6908543956816328, 2.6446596755601792, 2.6929269869830503, 'A', 'B')
B = 4.221204 + 0.536063xA
Sum of squared errors: 5.664281
Variance score: 0.069630
(0.536062906724512, 4.221203904555315, 5.664281453362259, 'A', 'B')
B = 4.782102 + 0.917614xA
Sum of squared errors: 5.613977
Variance score: 0.077892
(0.9176136363636359, 4.782102272727273, 5.613977272727274, 'A', 'B')
B = 2.849946 + 0.388015xA
Sum of squared errors: 6.891700
Variance score: 0.031221
(0.3880151843817785, 2.8499457700650765, 6.891700108459869, 'A', 'B')
B = 3.175213 + 0.995028xA
Sum of squared errors: 6.556186
Variance score: 0.078385
(0.9950284090909087, 3.1752130681818183, 6.556186079545453, 'A', 'B')
B = 1.343040 + 0.495739xA
Sum of squared errors: 1.336790
Variance score: 0.093825
(0.4957386363636363, 1.3430397727272727, 1.3367897727272722, 'A', 'B')
In [4]: sse=[0]*12
print("One Two")
a,b,sse[0],d,e = model(one,two)
print("one,three")
a,b,sse[1],d,e = model(one,three)
print("one,four")
a,b,sse[2],d,e = model(one,four)
print("two,three")
a,b,sse[3],d,e = model(two,three)
print("two,four")
a,b,sse[4],d,e = model(two,four)
localhost:8888/nbconvert/html/Activity-lr-1.ipynb?download=false
4/7
3/23/23, 8:47 PM
Activity-lr-1
print("three,four")
a,b,sse[5],d,e = model(three,four)
print("two,one")
a,b,sse[6],d,e = model(two,one)
print("three,one")
a,b,sse[7],d,e = model(three,one)
print("four,one")
a,b,sse[8],d,e = model(four,one)
print("three,two")
a,b,sse[9],d,e = model(three,two)
print("four,two")
a,b,sse[10],d,e = model(four,two)
print("four, three")
a,b,sse[11],d,e = model(four, three)
localhost:8888/nbconvert/html/Activity-lr-1.ipynb?download=false
5/7
3/23/23, 8:47 PM
Activity-lr-1
One Two
B = -0.623012 + 0.807234xA
Sum of squared errors: 3.146569
Variance score: 0.557681
one,three
B = 0.813768 + 0.129891xA
Sum of squared errors: 1.372483
Variance score: 0.069630
one,four
B = -0.180937 + 0.084886xA
Sum of squared errors: 0.519331
Variance score: 0.077892
two,three
B = 1.188976 + 0.080463xA
Sum of squared errors: 1.429143
Variance score: 0.031221
two,four
B = -0.025258 + 0.078776xA
Sum of squared errors: 0.519054
Variance score: 0.078385
three,four
B = -0.033080 + 0.189262xA
Sum of squared errors: 0.510358
Variance score: 0.093825
two,one
B = 2.644660 + 0.690854xA
Sum of squared errors: 2.692927
Variance score: 0.557681
three,one
B = 4.221204 + 0.536063xA
Sum of squared errors: 5.664281
Variance score: 0.069630
four,one
B = 4.782102 + 0.917614xA
Sum of squared errors: 5.613977
Variance score: 0.077892
three,two
B = 2.849946 + 0.388015xA
Sum of squared errors: 6.891700
Variance score: 0.031221
four,two
B = 3.175213 + 0.995028xA
Sum of squared errors: 6.556186
Variance score: 0.078385
four, three
B = 1.343040 + 0.495739xA
Sum of squared errors: 1.336790
Variance score: 0.093825
In [5]: # the least sse is for third and fourth column where third column is used as in
# 0.510 is the least sum of square error
print("The minimum SSE is obtained for columns three as independent and four as
The minimum SSE is obtained for columns three as independent and four as depen
dent: 0.5103579175704992
In [6]: print("Model with the least SSE")
model(three,four)
localhost:8888/nbconvert/html/Activity-lr-1.ipynb?download=false
6/7
3/23/23, 8:47 PM
Out[6]:
Activity-lr-1
Model with the least SSE
B = -0.033080 + 0.189262xA
Sum of squared errors: 0.510358
Variance score: 0.093825
(0.18926247288503256, -0.03308026030368766, 0.5103579175704992, 'A', 'B')
In [ ]:
localhost:8888/nbconvert/html/Activity-lr-1.ipynb?download=false
7/7
Download