UNIVERSITI TEKNOLOGI PETRONAS EXTENDED ASSIGNMENT MAY 2021 SEMESTER COURSE : FEM 2063 – DATA ANALYTICS DATE : 1st AUGUST 2021 TIME : 9:00 AM (24 HOURS) INSTRUCTIONS TO CANDIDATES 1. Extended Assignment (EA) is an open-book assessment. Students can refer to online resources, learning materials, textbooks, and other reading materials to answer the questions posted in the assessment. 2. Answer ALL questions. 3. The duration to complete the EA is TWENTY-FOUR (24) HOURS. 4. Students are allowed ONE (1) attempt to do the EA successfully where only ONE (1) duly completed EA submission is permitted. Multiple submissions are NOT allowed. 5. MAXIMUM file size for your EA submission to be uploaded to ULearn is 50MB. 6. Please upload your answers in ONE (1) PDF file. 7. Please make sure your answer in the PDF file is clear and readable and name your file as follows: "your name_your ID_EA Answer" 8. Late submission and unclear/unreadable answer will not be accepted. Universiti Teknologi PETRONAS FEM 2063 1. Select any CATEGORICAL dataset that contains at least 150 observations with THREE (3) attributes from any reliable source. Choose any THREE (3) of the following classification methods i. Logistic Regression (LR), ii. Naïve Bayes (NB), iii. Linear Discriminant Analysis (LDA), iv. K Nearest Neighbors (KNN), v. Support Vector Machines (SVM), to perform detailed analyses of the selected dataset. Use the first 70% of the data to train the model and the remaining 30% to test the accuracy of the model. Explain your choices of attributes and discuss your results. NOTES: • The link to the selected dataset should be provided and the dataset should NOT have been used in the lectures or labs of the course. • Any preprocessing method (e.g. removal or filling of empty cells) performed on the original data needs to be fully described and shown. • Your analyses shall include the descriptions of your Python codes or any other software outputs to support the analyses. [50 marks] 2 FEM 2063 2. Select any dataset that contains more than 300 observations with at least 10 attributes from https://archive.ics.uci.edu or https://www.kaggle.com or any other online free data repository. Perform detailed analyses on the selected data by using ONE (1) data reduction method and ONE (1) clustering method of your choice. Explain your choices and discuss your results. NOTES: • The link and the description of the selected dataset should be provided, and the dataset should NOT have been used in the lectures or labs of the course. • Describe data set information such as number of instances/ features/ attributes/ columns, number of dataset/rows, area/ domain/ field, and/or missing value(s) if any. • Any preprocessing method (e.g. removal or filling of empty cells) performed on the original data needs to be fully described and shown. • Your analyses shall include the descriptions of your Python codes and plots. [50 marks] -END OF PAPER- 3