Uploaded by Maisarah Jedin

EA-FEM2063-DA-MAY 2021 Questions

advertisement
UNIVERSITI
TEKNOLOGI
PETRONAS
EXTENDED ASSIGNMENT
MAY 2021 SEMESTER
COURSE : FEM 2063 – DATA ANALYTICS
DATE
: 1st AUGUST 2021
TIME
: 9:00 AM (24 HOURS)
INSTRUCTIONS TO CANDIDATES
1.
Extended Assignment (EA) is an open-book assessment. Students
can refer to online resources, learning materials, textbooks, and other
reading materials to answer the questions posted in the assessment.
2.
Answer ALL questions.
3.
The duration to complete the EA is TWENTY-FOUR (24) HOURS.
4.
Students are allowed ONE (1) attempt to do the EA successfully where
only ONE (1) duly completed EA submission is permitted. Multiple
submissions are NOT allowed.
5.
MAXIMUM file size for your EA submission to be uploaded to ULearn
is 50MB.
6.
Please upload your answers in ONE (1) PDF file.
7.
Please make sure your answer in the PDF file is clear and readable
and name your file as follows: "your name_your ID_EA Answer"
8.
Late submission and unclear/unreadable answer will not be accepted.
Universiti Teknologi PETRONAS
FEM 2063
1.
Select any CATEGORICAL dataset that contains at least 150 observations with
THREE (3) attributes from any reliable source. Choose any THREE (3) of the
following classification methods
i.
Logistic Regression (LR),
ii.
Naïve Bayes (NB),
iii.
Linear Discriminant Analysis (LDA),
iv.
K Nearest Neighbors (KNN),
v.
Support Vector Machines (SVM),
to perform detailed analyses of the selected dataset. Use the first 70% of the data
to train the model and the remaining 30% to test the accuracy of the model. Explain
your choices of attributes and discuss your results.
NOTES:
•
The link to the selected dataset should be provided and the dataset should
NOT have been used in the lectures or labs of the course.
•
Any preprocessing method (e.g. removal or filling of empty cells) performed on
the original data needs to be fully described and shown.
•
Your analyses shall include the descriptions of your Python codes or any
other software outputs to support the analyses.
[50 marks]
2
FEM 2063
2.
Select any dataset that contains more than 300 observations with at least 10
attributes from https://archive.ics.uci.edu or https://www.kaggle.com or any other
online free data repository. Perform detailed analyses on the selected data by
using ONE (1) data reduction method and ONE (1) clustering method of your
choice. Explain your choices and discuss your results.
NOTES:
•
The link and the description of the selected dataset should be provided, and
the dataset should NOT have been used in the lectures or labs of the course.
•
Describe data set information such as number of instances/ features/
attributes/ columns, number of dataset/rows, area/ domain/ field, and/or
missing value(s) if any.
•
Any preprocessing method (e.g. removal or filling of empty cells) performed
on the original data needs to be fully described and shown.
•
Your analyses shall include the descriptions of your Python codes and
plots.
[50 marks]
-END OF PAPER-
3
Download