Uploaded by Marwa Maiwa

Bonus project

advertisement
Individual Project - Bonus
The main objective of this project is to prepare the datasets to feed different machine
learning algorithms for a classification task. For this you should:




Clean dataset, making sure that the data do not contain invalid characters, removing
rows with empty values, removing duplicate columns, filling NAN data cells with zero
values, filling the empty data cells with zero values.
Labeling, categorical labeling (0–N), where N is the number of labels.
Plot the Summary of Dataset Distribution
Plot the Correlation matrix (Pearson’s)
The datasets are distributed as following:
Dataset Name
NF-ToN-IoT-v2
Dataset link
https://staff.itee.uq.edu.au/marius/
NF-BoT-IoT-v2
CIC-ToN-IoT
CIC-BoT-IoT
NSL-KDD
https://www.unb.ca/cic/datasets/nsl.html
Kitsune
https://data.mendeley.com/datasets/zvsk3k9cf2/1
Deliverable: Python code .ipynb
Deadline: 14/03/2022
Student ID
201910747
202010232
202010959
202120057
202010590
202010914
202011140
202120055
202011025
202011108
202011465
202120058
202011134
202011639
202120059
202120063
202011669
202120051
202120053
202120064
202020278
202120060
202120069
Download