Green University of Bangladesh Department of Computer Science and Engineering (CSE) Faculty of Sciences and Engineering Semester: (Spring, Year: 2023), B.Sc. in CSE (Day) Lab Report 04 Course Title: Machine Learning Lab Course Code: CSE 412 Section: 201 D1 Lab Experiment Name: Perform Naıve Bayes Classifier on my dataset and report classification results. Students Details Name Prohlad Mandal ID 201002267 Submission Date: 27-03-2023 Course Teacher’s Name: Dr. Muhammad Abul Hasan [For teachers use only: Don’t write anything inside this box] Lab Project Status Marks: Signature: Comments: Date: 0.1 TITLE OF THE LAB EXPERIMENT Perform Naıve Bayes Classifier on my dataset and report classification results. 0.2 OBJECTIVES/AIM [1 mark] • To perform the Naive Bayes algorithm on my own dataset. • To know how the Naive Bayes algorithm actually works. • To know how can we classify the dataset. 0.3 PROCEDURE / ANALYSIS / DESIGN [2 marks] Perform Naıve Bayes Classifier on my dataset : 1. Collect data from various resources and make a dataset. 2. Give the level of every data 3. Now prepared the dataset for training and testing. 4. After preparing the dataset keep 20% of data for testing and 80% for training 5. Now use various python libraries to perform the algorithm, like pandas, sci-kitlearn 6. After implementation check the result. 0.4 IMPLEMENTATION [2 marks] Java code to perform topological sort using BFS: #import pandas library for creating data frame import pandas as pd #file path which was stored in my drive path = "/content/drive/MyDrive/ML/sport_and_politics_dataset/sports_and_politics.cs #creating data frame using pandas library df = pd.read_csv(path) #show the first 5 dataset df.head() #output 1 Figure 1: Output #show the last 5 dataset df.tail() #output Figure 2: Output #checking the value df['category'].value_counts() #output POLITICS 50 SPORTS 50 Name: category, dtype: int64 #seperate X and Y #X is independent and Y is dependent x = df.short_description.values y = df.category.values #split dataset from sklearn.model_selection import train_test_split #seperate dataset for training and testing xtrain, xtest, ytrain, ytest = train_test_split(x,y,test_size=0.2) #data preprocessing from sklearn.feature_extraction.text import CountVectorizer cv = CountVectorizer() x_train = cv.fit_transform(xtrain) x_train.toarray() 2 #output array([[0, 0, [0, 0, [0, 0, ..., [0, 0, [0, 0, [0, 0, 0, ..., 0, 0, 0], 0, ..., 0, 0, 0], 0, ..., 0, 0, 0], 0, ..., 0, 0, 0], 0, ..., 1, 0, 0], 0, ..., 0, 1, 0]]) #apply Naive Bayes Algorithm from sklearn.naive_bayes import MultinomialNB model = MultinomialNB() model.fit(x_train, ytrain) x_test = cv.transform(xtest) model.score(x_test, ytest) #output 0.6 #let's make an input array news = ['By Matt Yoder, Awful Announcing This video may be my favorite thing to com #preprocessing input data cv_news = cv.transform(news) #check the result model.predict(cv_news) #output array(['SPORTS', 'POLITICS'], dtype='<U8') 0.5 TEST RESULT / OUTPUT [2 marks] Result of Naıve Bayes Classifier on my dataset is: Accuracy: 0.6 or 60% Predicted Result: Correct 0.6 ANALYSIS AND DISCUSSION [2 marks] As my dataset is a small dataset that contains only 100 data and from this, I used 80% data for training and 20% data for testing. For this, I think the result of accuracy is low but in spite of the small dataset, it can also predict correctly. I think if I increase my 3 dataset the accuracy will increase. 0.7 SUMMARY: From this lab report, I may know about the Naive Bayes algorithm properly and how to implement the algorithm using various python libraries. To extend a dataset and code for more than two classes, I can follow these steps: 1. Collecting a larger dataset with multiple classes: I will need to collect a dataset that has more than two classes. This can be done by either collecting new data or by finding an existing dataset that has more than two classes. 2. Labeling the data: After collecting the data, I will need to label it according to the different classes. This can be done manually or by using an automated tool. 3. Updating the code for multiple classes: To extend my code for more than two classes, I will need to modify it to work with the new dataset. This will likely involve updating the code to handle more than two classes. 4. Modifying the loss function: I will need to modify the loss function to accommodate more than two classes. This can be done by using a categorical crossentropy loss function, which is designed to handle multiple classes. 5. Training the model: After updating my code and modifying the loss function, I can train my model on the new dataset. I will need to adjust the hyperparameters, such as learning rate, batch size, and the number of epochs, to ensure that the model is properly trained. 6. Evaluating the model: Finally, I will need to evaluate the performance of my model on the new dataset. This can be done by calculating the accuracy, precision, recall, and F1 score for each class. I can also visualize the performance using confusion matrices or ROC curves. By following these steps, I can extend my dataset and code for more than two classes and create a model that can accurately classify data into multiple classes. 4