Uploaded by aimen.bsee5

MACHINE LEARNING LAB REPORTS

advertisement
Lab9
Lab Task:
Implement QDA on any dataset and explain with comments. Each student should implement on different
dataset.
Dataset:
Breast cancer
1|Page
Output:
Conclusion:
QDA will work best when the variances are very different between classes and we have enough
observations to accurately estimate the variances
LDA will work best when the variances are similar among classes or we don’t have enough data
to accurately estimate the variances
The output of the breast cancer datasets shows that the Accuracy has been achieved of about
97% which shows that QDA Classifier algorithm is efficient for supervised learning based
classification.
On contrast to QDA, LDA classifier has been achieved accuracy to about 95% on same breast
cancer dataset.
2|Page
Lab10
Lab Task:
Implement Gaussian Naive Bayes classifier using scikit-learn on iris dataset.
Step 1 - Import the library
Imported various modules like datasets, mertics, and GaussianNB from differnt libraries.
Step 2 - Setup the Data
Here used datasets to load the inbuilt iris dataset and created objects X and y to store the data and the
target value respectively.
Step 3 - Model and its Score
•
•
•
Take GaussianNB as a Machine Learning model to fit the data.
Then predicted the output by passing X_test and also stored real target in expected_y.
then printed classification report and confusion matrix for the classifier.
Page | 1
Output:
Conclusion:
Accuracy has been achieved of 100% when the test size fixed to 0.28 or 28%, which shows that
Gaussian Naïve Bayes algorithm is efficient for supervised learning based classification.
Page | 2
Lab 11
Lab Task: Implementing PCA in Python with Scikit-Learn on Iris dataset.
Code
# importing required libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
# importing or loading the datasets
dataset = datasets.load_iris()
# distributing the dataset into two components X and Y
X = dataset.data ; y = dataset.target
# Splitting the X and Y into the
# Training set and Testing set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 75)
# performing preprocessing part
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Applying PCA function on training
# and testing set of X component
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)
# Fitting Logistic Regression To the training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
# Predicting the test set result using
# predict function under LogisticRegression
y_pred = classifier.predict(X_test)
# making confusion matrix between
# test set of Y and predicted value.
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
# Predicting the training set
# result through scatter plot
1|Page
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1,
stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1,
stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape), alpha = 0.75,
cmap = ListedColormap(('yellow', 'white', 'aquamarine')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green', 'blue'))(i), label = j)
plt.title('Logistic Regression (Training set)')
plt.xlabel('PC1') # for Xlabel
plt.ylabel('PC2') # for Ylabel
plt.legend() # to show legend
# show scatter plot
plt.show()
# Visualising the Test set results through scatter plot
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1,
stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1,
stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape), alpha = 0.75,
cmap = ListedColormap(('yellow', 'white', 'aquamarine')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green', 'blue'))(i), label = j)
# title for scatter plot
plt.title('Logistic Regression (Test set)')
plt.xlabel('PC1') # for Xlabel
plt.ylabel('PC2') # for Ylabel
plt.legend()
2|Page
# show scatter plot
plt.show()
3|Page
4|Page
Output Plot:
LOGISTIC REGRESSION (TRAINING SET)
LOGISTIC REGRESSION (TEST SET)
5|Page
Experiment No. 12: Simple Neural Network in Python
import numpy as np
import matplotlib.pyplot as plt # to plot error during training
#input
inputs=np.array([[0,1,0],
[0,1,1], [1,1,0],[1,0,1]])
#output
outputs=np.array([[0],[0],[1],[1]])
# create NeuralNetwork class
class NeuralNetwork:
# intialize variables in class
def __init__(self, inputs, outputs):
self.inputs = inputs
self.outputs = outputs
# initialize weights as .50 for simplicity
self.weights = np.array([[.50], [.50], [.50]])
self.error_history = []
self.epoch_list = []
#activation function ==> S(x) = 1/1+e^(-x)
def sigmoid(self, x, deriv=False):
if deriv == True:
return x * (1 - x)
return 1 / (1 + np.exp(-x))
# data will flow through the neural network.
def feed_forward(self):
self.hidden = self.sigmoid(np.dot(self.inputs, self.weights))
# going backwards through the network to update weights
def backpropagation(self):
self.error = self.outputs - self.hidden
delta = self.error * self.sigmoid(self.hidden, deriv=True)
self.weights += np.dot(self.inputs.T, delta)
# train the neural net for 25,000 iterations
def train(self, epochs=25000):
for epoch in range(epochs):
# flow forward and produce an output
self.feed_forward()
# go back though the network to make corrections based on the output
self.backpropagation()
# keep track of the error history over each epoch
self.error_history.append(np.average(np.abs(self.error)))
LAB_12
Page 1
self.epoch_list.append(epoch)
# function to predict output on new and unseen input data
def predict(self, new_input):
prediction = self.sigmoid(np.dot(new_input, self.weights))
return prediction
# create neural network
NN = NeuralNetwork(inputs, outputs)
# train neural network
NN.train()
# create two new examples to predict
example = np.array([[1, 1, 0]])
example_2 = np.array([[0, 1, 1]])
# print the predictions for both examples
print(NN.predict(example), ' - Correct: ', example[0][0])
print(NN.predict(example_2), ' - Correct: ', example_2[0][0])
# plot the error over the entire training duration
plt.figure(figsize=(15,5))
plt.plot(NN.epoch_list, NN.error_history)
plt.xlabel('Epoch')
plt.ylabel('Error')
plt.show()
SNAPSHOTS OF CODE:
LAB_12
Page 2
OUTPUT:
PLOT:
LAB_12
Page 3
LAB 14
Experiment No. 14: Identify Overfitting Machine Learning Models In Scikit-Learn
STEP#1 evaluate decision tree performance on train and test sets with different tree depths
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from matplotlib import pyplot
STEP#2 synthetic classification dataset
from sklearn.datasets import make_classification
STEP#3 define dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=5, n_redundant=15,
random_state=1)
STEP#4 summarize the dataset
print(X.shape, y.shape)
STEP#5 split a dataset into train and test sets
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
STEP#6 create dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=5, n_redundant=15,
random_state=1)
STEP#7 split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
STEP#8 summarize the shape of the train and test sets
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
STEP#9 define the tree depths to evaluate
values = [i for i in range(1, 51)]
STEP#10 evaluate a decision tree for each depth for i in values:
1. # configure the model
model = DecisionTreeClassifier(max_depth=i)
2. # fit model on the training dataset
model.fit(X_train, y_train)
3. # evaluate on the train dataset
train_yhat = model.predict(X_train)
train_acc = accuracy_score(y_train, train_yhat)
train_scores.append(train_acc)
4. # evaluate on the test dataset
test_yhat = model.predict(X_test)
test_acc = accuracy_score(y_test, test_yhat)
test_scores.append(test_acc)
5. # summarize progress
print('>%d, train: %.3f, test: %.3f' % (i, train_acc, test_acc))
LAB 14
Page 1
LAB 14
Page 2
LAB 14
Page 3
Lab Task:
Implement KNN on any data set and choose different values of K to see how it impacts the accuracy
of the predictions.
1|P a ge
2|P a ge
Output Plot:
Conclusion:
From the output we can see that the mean error is zero when the value of the K is between 3 and
5.
For our final model we can choose a optimal value of K as 3 (which falls between 3 and 5)
3|P a ge
Lab 8
Plot the data as a function of the two LDA components
1|P a ge
Below code commands is to obtain the variance explained by each
component. Plot the data as a function of the two LDA components
Below code commands is to obtain the variance explained by each
component. And compare it with LDA we Plot the data as a function of
the two PCA components.
2|P a ge
3|P a ge
Conclusion:
In this we obtain the variance explained by each component by LDA. And
compare it with PCA.
Concluding here that by both plots, PCA selected the components which
would result in the highest spread (retain the most information) and not
necessarily the ones which maximize the separation between classes.
4|P a ge
HOME TASK LAB 6
Apply the Logistic Regression on ‘weather.csv’ database and report the analysis of its results.
User Database – This dataset contains information of weather. It contains information about
temperature, outlook, humidity, windy, and play. We are using this dataset for predicting that a
user will play in humidity and windy weather.
5|P a ge
6|P a ge
Confusion Matrix :
[[1 3]
[ 8 2]]
Out of 100:
TruePostive + TrueNegative = 1 + 3
FalsePositive + FalseNegative = 3 + 2
Accuracy : 0.42
7|P a ge
LAB 2
Experiment No. 2: Getting Started with Python
Lab 2
Page 1
Lab 2
Page 2
STEP #1 Uni-variate Plots
Univariate plots – plots of each individual variable.
Lab 2
Page 3
STEP #2 Histogram of each input variable to get an idea of the distribution.
STEP#3 Multivariate Plots
Interactions between the variables.
Lab 2
Page 4
Lab 3
Assignment
As a programmer, assignment and types should not be surprising to you.
Output:
Numbers
Page | 1
Boolean
Multiple Assignments
Flow Control
There are three main types of flow control that you need to learn:
 If-Then-Else conditions
Page | 2
For Loops:
Page | 3
While-Loop:
Tuple:
Tuples are read-only collections of items.
Page | 4
List:
Lists use the square bracket notation and can be index using array notation.
Dictionary:
Dictionaries are mappings of names to values, like a map. Note the use of the curly bracket
notation.
Page | 5
Functions
The biggest gotcha with Python is the whitespace. Ensure that you have an empty new line after
indented code. The example below defines a new function to calculate the sum of two values and
calls the function with two arguments.
NumPy
NumPy provides the foundation data structures and operations for SciPy. These are arrays (ndarrays)
that are efficient to define and manipulate.
Create Array
Page | 6
Access Data
Array notation and ranges can be used to efficiently access data in a NumPy array.
Arithmetic
NumPy arrays can be used directly in arithmetic.
Page | 7
Line Plot
The example below creates a simple line plot from one-dimensional data.
Scatter Plot
Below is a simple example of creating a scatter plot from two-dimensional data.
Page | 8
Series
A series is a one-dimensional array where the rows and columns can be labeled.
Page | 9
Lab 4
Experiment No. 4: Learning Model Building in Scikit-learn: A Python Machine Learning Library
Step #1 Loading exemplar dataset using scikit-learn
Output
LAB 4
Page 1
STEP# 2 Loading external dataset: using pandas library
Output
LAB 4
Page 2
Step 2: Splitting the dataset
Output
LAB 4
Page 3
Lab5
Experiment No. 5: Linear Regression
Lab Task:
• Implement multiple linear regression technique on the Boston house pricing dataset using
Scikit-learn.
• Mention the estimated coefficients obtained in linear regression code.
LAB 5
Page 1
PLOT TRAIN DATA AND TEST DATA OF BOSTON DATA SET
OUTPUT CONSOLE:
LAB 5
Page 2
LAB 5
Updated lab task using GRADIENT DESCENT METHOD:
Value of t0, t1
Lab 5
Page 1
SCATTER PLOT AND REGRESSION LINE
GRADIENT DESCENT PLOT SHOW:
Lab 5
Page 2
LAB13
Experiment No. 13: Handwritten digit recognition (Using Scikit-Learn)
Step# 1: Loading the Dataset
#importing the dataset
from sklearn.datasets import load_digits
digits = load_digits()
following command to know the shape of the Dataset:
print("Image Data Shape" , digits.data.shape)
# There are 1797 images in the dataset
Step # 2: Visualizing the images and labels in our Dataset
#Here we are visualizing the first 5 images in the Dataset
import numpy as np
import matplotlib.pyplot as plt
plt.figure(figsize=(20,4))
for index, (image, label) in enumerate(zip(digits.data[0:5], digits.target[0:5])):
plt.subplot(1, 5, index + 1)
plt.imshow(np.reshape(image, (8,8)), cmap=plt.cm.gray)
plt.title('Training: %i\n' % label, fontsize = 20)
Step # 3: Splitting our Dataset into training and testing sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.05, random_state=95)
Step # 4: The Scikit-Learn 4-Step Modeling Pattern
Step 1. Importing the model we want to use.
from sklearn.linear_model import LogisticRegression
Step #2. Making an instance of the Model
logisticRegr = LogisticRegression()
Step #3. Training the Model
logisticRegr.fit(x_train, y_train)
Step #4. Predicting the labels of new data
predictions = logisticRegr.predict(x_test)
Step # 5: Measuring the performance of our Model
Use accuracy_score
score = logisticRegr.score(x_test, y_test)
print(score)
Step # 6: Confusion matrix
Using Seaborn for our confusion matrix.
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import metrics
cm = metrics.confusion_matrix(y_test, predictions)
plt.figure(figsize=(9,9))
sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square = True, cmap = 'Pastel1');
plt.ylabel('Actual label');
plt.xlabel('Predicted label');
all_sample_title = 'Accuracy Score: {0}'.format(score)
plt.title(all_sample_title, size = 15);
LAB 13
Page 1
LAB 13
Page 2
LAB 13
Page 3
Download