Lab9 Lab Task: Implement QDA on any dataset and explain with comments. Each student should implement on different dataset. Dataset: Breast cancer 1|Page Output: Conclusion: QDA will work best when the variances are very different between classes and we have enough observations to accurately estimate the variances LDA will work best when the variances are similar among classes or we don’t have enough data to accurately estimate the variances The output of the breast cancer datasets shows that the Accuracy has been achieved of about 97% which shows that QDA Classifier algorithm is efficient for supervised learning based classification. On contrast to QDA, LDA classifier has been achieved accuracy to about 95% on same breast cancer dataset. 2|Page Lab10 Lab Task: Implement Gaussian Naive Bayes classifier using scikit-learn on iris dataset. Step 1 - Import the library Imported various modules like datasets, mertics, and GaussianNB from differnt libraries. Step 2 - Setup the Data Here used datasets to load the inbuilt iris dataset and created objects X and y to store the data and the target value respectively. Step 3 - Model and its Score • • • Take GaussianNB as a Machine Learning model to fit the data. Then predicted the output by passing X_test and also stored real target in expected_y. then printed classification report and confusion matrix for the classifier. Page | 1 Output: Conclusion: Accuracy has been achieved of 100% when the test size fixed to 0.28 or 28%, which shows that Gaussian Naïve Bayes algorithm is efficient for supervised learning based classification. Page | 2 Lab 11 Lab Task: Implementing PCA in Python with Scikit-Learn on Iris dataset. Code # importing required libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn import datasets # importing or loading the datasets dataset = datasets.load_iris() # distributing the dataset into two components X and Y X = dataset.data ; y = dataset.target # Splitting the X and Y into the # Training set and Testing set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 75) # performing preprocessing part from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) # Applying PCA function on training # and testing set of X component from sklearn.decomposition import PCA pca = PCA(n_components = 2) X_train = pca.fit_transform(X_train) X_test = pca.transform(X_test) # Fitting Logistic Regression To the training set from sklearn.linear_model import LogisticRegression classifier = LogisticRegression(random_state = 0) classifier.fit(X_train, y_train) # Predicting the test set result using # predict function under LogisticRegression y_pred = classifier.predict(X_test) # making confusion matrix between # test set of Y and predicted value. from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) # Predicting the training set # result through scatter plot 1|Page from matplotlib.colors import ListedColormap X_set, y_set = X_train, y_train X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01), np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01)) plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape), alpha = 0.75, cmap = ListedColormap(('yellow', 'white', 'aquamarine'))) plt.xlim(X1.min(), X1.max()) plt.ylim(X2.min(), X2.max()) for i, j in enumerate(np.unique(y_set)): plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green', 'blue'))(i), label = j) plt.title('Logistic Regression (Training set)') plt.xlabel('PC1') # for Xlabel plt.ylabel('PC2') # for Ylabel plt.legend() # to show legend # show scatter plot plt.show() # Visualising the Test set results through scatter plot from matplotlib.colors import ListedColormap X_set, y_set = X_test, y_test X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01), np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01)) plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape), alpha = 0.75, cmap = ListedColormap(('yellow', 'white', 'aquamarine'))) plt.xlim(X1.min(), X1.max()) plt.ylim(X2.min(), X2.max()) for i, j in enumerate(np.unique(y_set)): plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green', 'blue'))(i), label = j) # title for scatter plot plt.title('Logistic Regression (Test set)') plt.xlabel('PC1') # for Xlabel plt.ylabel('PC2') # for Ylabel plt.legend() 2|Page # show scatter plot plt.show() 3|Page 4|Page Output Plot: LOGISTIC REGRESSION (TRAINING SET) LOGISTIC REGRESSION (TEST SET) 5|Page Experiment No. 12: Simple Neural Network in Python import numpy as np import matplotlib.pyplot as plt # to plot error during training #input inputs=np.array([[0,1,0], [0,1,1], [1,1,0],[1,0,1]]) #output outputs=np.array([[0],[0],[1],[1]]) # create NeuralNetwork class class NeuralNetwork: # intialize variables in class def __init__(self, inputs, outputs): self.inputs = inputs self.outputs = outputs # initialize weights as .50 for simplicity self.weights = np.array([[.50], [.50], [.50]]) self.error_history = [] self.epoch_list = [] #activation function ==> S(x) = 1/1+e^(-x) def sigmoid(self, x, deriv=False): if deriv == True: return x * (1 - x) return 1 / (1 + np.exp(-x)) # data will flow through the neural network. def feed_forward(self): self.hidden = self.sigmoid(np.dot(self.inputs, self.weights)) # going backwards through the network to update weights def backpropagation(self): self.error = self.outputs - self.hidden delta = self.error * self.sigmoid(self.hidden, deriv=True) self.weights += np.dot(self.inputs.T, delta) # train the neural net for 25,000 iterations def train(self, epochs=25000): for epoch in range(epochs): # flow forward and produce an output self.feed_forward() # go back though the network to make corrections based on the output self.backpropagation() # keep track of the error history over each epoch self.error_history.append(np.average(np.abs(self.error))) LAB_12 Page 1 self.epoch_list.append(epoch) # function to predict output on new and unseen input data def predict(self, new_input): prediction = self.sigmoid(np.dot(new_input, self.weights)) return prediction # create neural network NN = NeuralNetwork(inputs, outputs) # train neural network NN.train() # create two new examples to predict example = np.array([[1, 1, 0]]) example_2 = np.array([[0, 1, 1]]) # print the predictions for both examples print(NN.predict(example), ' - Correct: ', example[0][0]) print(NN.predict(example_2), ' - Correct: ', example_2[0][0]) # plot the error over the entire training duration plt.figure(figsize=(15,5)) plt.plot(NN.epoch_list, NN.error_history) plt.xlabel('Epoch') plt.ylabel('Error') plt.show() SNAPSHOTS OF CODE: LAB_12 Page 2 OUTPUT: PLOT: LAB_12 Page 3 LAB 14 Experiment No. 14: Identify Overfitting Machine Learning Models In Scikit-Learn STEP#1 evaluate decision tree performance on train and test sets with different tree depths from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.tree import DecisionTreeClassifier from matplotlib import pyplot STEP#2 synthetic classification dataset from sklearn.datasets import make_classification STEP#3 define dataset X, y = make_classification(n_samples=10000, n_features=20, n_informative=5, n_redundant=15, random_state=1) STEP#4 summarize the dataset print(X.shape, y.shape) STEP#5 split a dataset into train and test sets from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split STEP#6 create dataset X, y = make_classification(n_samples=10000, n_features=20, n_informative=5, n_redundant=15, random_state=1) STEP#7 split into train test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) STEP#8 summarize the shape of the train and test sets print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) STEP#9 define the tree depths to evaluate values = [i for i in range(1, 51)] STEP#10 evaluate a decision tree for each depth for i in values: 1. # configure the model model = DecisionTreeClassifier(max_depth=i) 2. # fit model on the training dataset model.fit(X_train, y_train) 3. # evaluate on the train dataset train_yhat = model.predict(X_train) train_acc = accuracy_score(y_train, train_yhat) train_scores.append(train_acc) 4. # evaluate on the test dataset test_yhat = model.predict(X_test) test_acc = accuracy_score(y_test, test_yhat) test_scores.append(test_acc) 5. # summarize progress print('>%d, train: %.3f, test: %.3f' % (i, train_acc, test_acc)) LAB 14 Page 1 LAB 14 Page 2 LAB 14 Page 3 Lab Task: Implement KNN on any data set and choose different values of K to see how it impacts the accuracy of the predictions. 1|P a ge 2|P a ge Output Plot: Conclusion: From the output we can see that the mean error is zero when the value of the K is between 3 and 5. For our final model we can choose a optimal value of K as 3 (which falls between 3 and 5) 3|P a ge Lab 8 Plot the data as a function of the two LDA components 1|P a ge Below code commands is to obtain the variance explained by each component. Plot the data as a function of the two LDA components Below code commands is to obtain the variance explained by each component. And compare it with LDA we Plot the data as a function of the two PCA components. 2|P a ge 3|P a ge Conclusion: In this we obtain the variance explained by each component by LDA. And compare it with PCA. Concluding here that by both plots, PCA selected the components which would result in the highest spread (retain the most information) and not necessarily the ones which maximize the separation between classes. 4|P a ge HOME TASK LAB 6 Apply the Logistic Regression on ‘weather.csv’ database and report the analysis of its results. User Database – This dataset contains information of weather. It contains information about temperature, outlook, humidity, windy, and play. We are using this dataset for predicting that a user will play in humidity and windy weather. 5|P a ge 6|P a ge Confusion Matrix : [[1 3] [ 8 2]] Out of 100: TruePostive + TrueNegative = 1 + 3 FalsePositive + FalseNegative = 3 + 2 Accuracy : 0.42 7|P a ge LAB 2 Experiment No. 2: Getting Started with Python Lab 2 Page 1 Lab 2 Page 2 STEP #1 Uni-variate Plots Univariate plots – plots of each individual variable. Lab 2 Page 3 STEP #2 Histogram of each input variable to get an idea of the distribution. STEP#3 Multivariate Plots Interactions between the variables. Lab 2 Page 4 Lab 3 Assignment As a programmer, assignment and types should not be surprising to you. Output: Numbers Page | 1 Boolean Multiple Assignments Flow Control There are three main types of flow control that you need to learn: If-Then-Else conditions Page | 2 For Loops: Page | 3 While-Loop: Tuple: Tuples are read-only collections of items. Page | 4 List: Lists use the square bracket notation and can be index using array notation. Dictionary: Dictionaries are mappings of names to values, like a map. Note the use of the curly bracket notation. Page | 5 Functions The biggest gotcha with Python is the whitespace. Ensure that you have an empty new line after indented code. The example below defines a new function to calculate the sum of two values and calls the function with two arguments. NumPy NumPy provides the foundation data structures and operations for SciPy. These are arrays (ndarrays) that are efficient to define and manipulate. Create Array Page | 6 Access Data Array notation and ranges can be used to efficiently access data in a NumPy array. Arithmetic NumPy arrays can be used directly in arithmetic. Page | 7 Line Plot The example below creates a simple line plot from one-dimensional data. Scatter Plot Below is a simple example of creating a scatter plot from two-dimensional data. Page | 8 Series A series is a one-dimensional array where the rows and columns can be labeled. Page | 9 Lab 4 Experiment No. 4: Learning Model Building in Scikit-learn: A Python Machine Learning Library Step #1 Loading exemplar dataset using scikit-learn Output LAB 4 Page 1 STEP# 2 Loading external dataset: using pandas library Output LAB 4 Page 2 Step 2: Splitting the dataset Output LAB 4 Page 3 Lab5 Experiment No. 5: Linear Regression Lab Task: • Implement multiple linear regression technique on the Boston house pricing dataset using Scikit-learn. • Mention the estimated coefficients obtained in linear regression code. LAB 5 Page 1 PLOT TRAIN DATA AND TEST DATA OF BOSTON DATA SET OUTPUT CONSOLE: LAB 5 Page 2 LAB 5 Updated lab task using GRADIENT DESCENT METHOD: Value of t0, t1 Lab 5 Page 1 SCATTER PLOT AND REGRESSION LINE GRADIENT DESCENT PLOT SHOW: Lab 5 Page 2 LAB13 Experiment No. 13: Handwritten digit recognition (Using Scikit-Learn) Step# 1: Loading the Dataset #importing the dataset from sklearn.datasets import load_digits digits = load_digits() following command to know the shape of the Dataset: print("Image Data Shape" , digits.data.shape) # There are 1797 images in the dataset Step # 2: Visualizing the images and labels in our Dataset #Here we are visualizing the first 5 images in the Dataset import numpy as np import matplotlib.pyplot as plt plt.figure(figsize=(20,4)) for index, (image, label) in enumerate(zip(digits.data[0:5], digits.target[0:5])): plt.subplot(1, 5, index + 1) plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray) plt.title('Training: %i\n' % label, fontsize = 20) Step # 3: Splitting our Dataset into training and testing sets from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.05, random_state=95) Step # 4: The Scikit-Learn 4-Step Modeling Pattern Step 1. Importing the model we want to use. from sklearn.linear_model import LogisticRegression Step #2. Making an instance of the Model logisticRegr = LogisticRegression() Step #3. Training the Model logisticRegr.fit(x_train, y_train) Step #4. Predicting the labels of new data predictions = logisticRegr.predict(x_test) Step # 5: Measuring the performance of our Model Use accuracy_score score = logisticRegr.score(x_test, y_test) print(score) Step # 6: Confusion matrix Using Seaborn for our confusion matrix. import matplotlib.pyplot as plt import seaborn as sns from sklearn import metrics cm = metrics.confusion_matrix(y_test, predictions) plt.figure(figsize=(9,9)) sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square = True, cmap = 'Pastel1'); plt.ylabel('Actual label'); plt.xlabel('Predicted label'); all_sample_title = 'Accuracy Score: {0}'.format(score) plt.title(all_sample_title, size = 15); LAB 13 Page 1 LAB 13 Page 2 LAB 13 Page 3