Uploaded by Prohlad Mandal

201002267 [ report 04 ]

advertisement
Green University of Bangladesh
Department of Computer Science and Engineering (CSE)
Faculty of Sciences and Engineering
Semester: (Spring, Year: 2023), B.Sc. in CSE (Day)
Lab Report 04
Course Title: Machine Learning Lab
Course Code: CSE 412
Section: 201 D1
Lab Experiment Name: Perform Naıve Bayes Classifier on my dataset
and report classification results.
Students Details
Name
Prohlad Mandal
ID
201002267
Submission Date: 27-03-2023
Course Teacher’s Name: Dr. Muhammad Abul Hasan
[For teachers use only: Don’t write anything inside this box]
Lab Project Status
Marks:
Signature:
Comments:
Date:
0.1
TITLE OF THE LAB EXPERIMENT
Perform Naıve Bayes Classifier on my dataset and report classification results.
0.2
OBJECTIVES/AIM [1 mark]
• To perform the Naive Bayes algorithm on my own dataset.
• To know how the Naive Bayes algorithm actually works.
• To know how can we classify the dataset.
0.3
PROCEDURE / ANALYSIS / DESIGN [2 marks]
Perform Naıve Bayes Classifier on my dataset :
1. Collect data from various resources and make a dataset.
2. Give the level of every data
3. Now prepared the dataset for training and testing.
4. After preparing the dataset keep 20% of data for testing and 80% for training
5. Now use various python libraries to perform the algorithm, like pandas, sci-kitlearn
6. After implementation check the result.
0.4
IMPLEMENTATION [2 marks]
Java code to perform topological sort using BFS:
#import pandas library for creating data frame
import pandas as pd
#file path which was stored in my drive
path = "/content/drive/MyDrive/ML/sport_and_politics_dataset/sports_and_politics.cs
#creating data frame using pandas library
df = pd.read_csv(path)
#show the first 5 dataset
df.head()
#output
1
Figure 1: Output
#show the last 5 dataset
df.tail()
#output
Figure 2: Output
#checking the value
df['category'].value_counts()
#output
POLITICS
50
SPORTS
50
Name: category, dtype: int64
#seperate X and Y
#X is independent and Y is dependent
x = df.short_description.values
y = df.category.values
#split dataset
from sklearn.model_selection import train_test_split
#seperate dataset for training and testing
xtrain, xtest, ytrain, ytest = train_test_split(x,y,test_size=0.2)
#data preprocessing
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
x_train = cv.fit_transform(xtrain)
x_train.toarray()
2
#output
array([[0, 0,
[0, 0,
[0, 0,
...,
[0, 0,
[0, 0,
[0, 0,
0, ..., 0, 0, 0],
0, ..., 0, 0, 0],
0, ..., 0, 0, 0],
0, ..., 0, 0, 0],
0, ..., 1, 0, 0],
0, ..., 0, 1, 0]])
#apply Naive Bayes Algorithm
from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB()
model.fit(x_train, ytrain)
x_test = cv.transform(xtest)
model.score(x_test, ytest)
#output
0.6
#let's make an input array
news = ['By Matt Yoder, Awful Announcing This video may be my favorite thing to com
#preprocessing input data
cv_news = cv.transform(news)
#check the result
model.predict(cv_news)
#output
array(['SPORTS', 'POLITICS'], dtype='<U8')
0.5
TEST RESULT / OUTPUT [2 marks]
Result of Naıve Bayes Classifier on my dataset is:
Accuracy: 0.6 or 60%
Predicted Result: Correct
0.6
ANALYSIS AND DISCUSSION [2 marks]
As my dataset is a small dataset that contains only 100 data and from this, I used 80%
data for training and 20% data for testing. For this, I think the result of accuracy is low
but in spite of the small dataset, it can also predict correctly. I think if I increase my
3
dataset the accuracy will increase.
0.7
SUMMARY:
From this lab report, I may know about the Naive Bayes algorithm properly and how to
implement the algorithm using various python libraries.
To extend a dataset and code for more than two classes, I can follow these steps:
1. Collecting a larger dataset with multiple classes: I will need to collect a dataset
that has more than two classes. This can be done by either collecting new data or
by finding an existing dataset that has more than two classes.
2. Labeling the data: After collecting the data, I will need to label it according to
the different classes. This can be done manually or by using an automated tool.
3. Updating the code for multiple classes: To extend my code for more than two
classes, I will need to modify it to work with the new dataset. This will likely
involve updating the code to handle more than two classes.
4. Modifying the loss function: I will need to modify the loss function to accommodate more than two classes. This can be done by using a categorical crossentropy loss function, which is designed to handle multiple classes.
5. Training the model: After updating my code and modifying the loss function, I
can train my model on the new dataset. I will need to adjust the hyperparameters,
such as learning rate, batch size, and the number of epochs, to ensure that the
model is properly trained.
6. Evaluating the model: Finally, I will need to evaluate the performance of my
model on the new dataset. This can be done by calculating the accuracy, precision,
recall, and F1 score for each class. I can also visualize the performance using
confusion matrices or ROC curves. By following these steps, I can extend my
dataset and code for more than two classes and create a model that can accurately
classify data into multiple classes.
4
Download