Introduction to scikit-learn package: Prof. Sankhadeep Chatterjee Objective: Familiarization with scikit-learn package Building Classifiers / Prediction Model Requirement: Packages : NumPy, SciPy, and matplotlib scikit-learn package (version >= 0.19.1) Suggested: Anaconda (version 3) 1. Write a Python program to load and show the type and dimension of the Iris dataset using Scikit learn library In [3]: from sklearn.datasets import load_iris In [4]: #type(datasets) type(load_iris) Out[4]: function In [5]: iris = load_iris() In [6]: type(iris) Out[6]: sklearn.utils.Bunch In [7]: print(iris.data.shape) (150, 4) In [8]: type(iris.data) Out[8]: numpy.ndarray In [10]: print(iris.target_names) print(iris.target) ['setosa' 'versicolor' [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2] 'virginica'] 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 0 1 1 2 0 1 1 2 0 1 1 2 0 1 1 2 0 1 1 2 0 1 1 2 0 1 1 2 0 1 1 2 0 1 2 2 0 1 2 2 0 1 2 2 0 1 2 2 0 1 2 2 0 1 2 2 0 1 2 2 0 1 2 2 0 1 2 2 0 1 2 2 0 1 2 2 2. Write a python program to build a K-Nearest Neighbour classifier using Scikit learn and predict the class of a unknown sample In [11]: x = iris.data y = iris.target In [12]: from sklearn.neighbors import KNeighborsClassifier In [13]: knn = KNeighborsClassifier(n_neighbors=1) In [14]: knn.fit(x,y) Out[14]: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=1, n_neighbors=1, p=2, weights='uniform') In [15]: knn.predict([[5, 3, 1, 0],[6,3,5,2]]) Out[15]: array([0, 2]) In [16]: knn5 = KNeighborsClassifier(n_neighbors=5) In [17]: x_new = [[5, 3, 1, 0],[6,3,5,2]] In [18]: knn5.fit(x,y) knn5.predict(x_new) Out[18]: array([0, 2]) 3. Write a Python program to create separate ndarrays for features and targets from data using Scikit learn library. Use train_test_split function to split the dataset into training and testing In [19]: from sklearn.cross_validation import train_test_split #help(train_test_split) C:\Users\Home\Anaconda3\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was depreca ted in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are mo ved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) In [20]: x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.4,random_state = 4) knn = KNeighborsClassifier(n_neighbors=5) knn.fit(x_train,y_train) Out[20]: KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=1, n_neighbors=5, p=2, weights='uniform') 4. Write a python program to build a K-Nearest Neighbour classifier using Scikit learn and test it using the test dataset. Find the accuracy using accuracy_score() function In [21]: from sklearn import metrics In [22]: type(metrics) Out[22]: module In [23]: y_pred = knn.predict(x_test) metrics.accuracy_score(y_test,y_pred) Out[23]: 0.96666666666666667 In [24]: metrics.confusion_matrix(y_test,y_pred) Out[24]: array([[25, 0, 0], [ 0, 15, 2], [ 0, 0, 18]], dtype=int64) 5. Write a python program to build a K-Nearest Neighbour classifier using Scikit learn and find the optimal value of k by plotting the accuracies for different values of k using matplotlib library In [25]: k_range = range(1,30) score = [] for k in k_range: knn = KNeighborsClassifier(n_neighbors=k) knn.fit(x_train,y_train) y_pred = knn.predict(x_test) score.append(metrics.accuracy_score(y_test,y_pred)) In [26]: import matplotlib.pyplot as plt %matplotlib inline plt.plot(k_range,score) plt.xlabel('Values of k') plt.ylabel('Accuracy') Out[26]: Text(0,0.5,'Accuracy')