Uploaded by Sankhadeep Chatterjee

Introduction to scikit-learn

advertisement
Introduction to scikit-learn package:
Prof. Sankhadeep Chatterjee
Objective:
Familiarization with scikit-learn package
Building Classifiers / Prediction Model
Requirement:
Packages : NumPy, SciPy, and matplotlib
scikit-learn package (version >= 0.19.1)
Suggested:
Anaconda (version 3)
1. Write a Python program to load and show the type and dimension of the Iris dataset using Scikit learn library
In [3]: from sklearn.datasets import load_iris
In [4]: #type(datasets)
type(load_iris)
Out[4]: function
In [5]: iris = load_iris()
In [6]: type(iris)
Out[6]: sklearn.utils.Bunch
In [7]: print(iris.data.shape)
(150, 4)
In [8]: type(iris.data)
Out[8]: numpy.ndarray
In [10]: print(iris.target_names)
print(iris.target)
['setosa' 'versicolor'
[0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2
2 2]
'virginica']
0 0 0 0 0 0 0
0 0 1 1 1 1 1
1 1 1 1 1 1 1
2 2 2 2 2 2 2
0
1
1
2
0
1
1
2
0
1
1
2
0
1
1
2
0
1
1
2
0
1
1
2
0
1
1
2
0
1
1
2
0
1
2
2
0
1
2
2
0
1
2
2
0
1
2
2
0
1
2
2
0
1
2
2
0
1
2
2
0
1
2
2
0
1
2
2
0
1
2
2
0
1
2
2
2. Write a python program to build a K-Nearest Neighbour classifier using Scikit learn and predict
the class of a unknown sample
In [11]: x = iris.data
y = iris.target
In [12]: from sklearn.neighbors import KNeighborsClassifier
In [13]: knn = KNeighborsClassifier(n_neighbors=1)
In [14]: knn.fit(x,y)
Out[14]: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=1, p=2,
weights='uniform')
In [15]: knn.predict([[5, 3, 1, 0],[6,3,5,2]])
Out[15]: array([0, 2])
In [16]: knn5 = KNeighborsClassifier(n_neighbors=5)
In [17]: x_new = [[5, 3, 1, 0],[6,3,5,2]]
In [18]: knn5.fit(x,y)
knn5.predict(x_new)
Out[18]: array([0, 2])
3. Write a Python program to create separate ndarrays for features and targets from data using
Scikit learn library. Use train_test_split function to split the dataset into training and testing
In [19]: from sklearn.cross_validation import train_test_split
#help(train_test_split)
C:\Users\Home\Anaconda3\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was depreca
ted in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are mo
ved. Also note that the interface of the new CV iterators are different from that of this module. This module will be
removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
In [20]: x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.4,random_state = 4)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train,y_train)
Out[20]: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=5, p=2,
weights='uniform')
4. Write a python program to build a K-Nearest Neighbour classifier using Scikit learn and test it
using the test dataset. Find the accuracy using accuracy_score() function
In [21]: from sklearn import metrics
In [22]: type(metrics)
Out[22]: module
In [23]: y_pred = knn.predict(x_test)
metrics.accuracy_score(y_test,y_pred)
Out[23]: 0.96666666666666667
In [24]: metrics.confusion_matrix(y_test,y_pred)
Out[24]: array([[25, 0, 0],
[ 0, 15, 2],
[ 0, 0, 18]], dtype=int64)
5. Write a python program to build a K-Nearest Neighbour classifier using Scikit learn and find the
optimal value of k by plotting the accuracies for different values of k using matplotlib library
In [25]: k_range = range(1,30)
score = []
for k in k_range:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(x_train,y_train)
y_pred = knn.predict(x_test)
score.append(metrics.accuracy_score(y_test,y_pred))
In [26]: import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(k_range,score)
plt.xlabel('Values of k')
plt.ylabel('Accuracy')
Out[26]: Text(0,0.5,'Accuracy')
Download