Fall Semester 2020-2021 Data Mining Techniques Assignment Consider the following simple dataset. The target attribute is PlayTennis, which has values Yes or No for different Saturday mornings, depending on several other attributes of those mornings. Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Outlook Temperature Humidity Wind PlayTennis Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rainy Mild High Weak Yes Rainy Cool Normal Weak Yes Rainy Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rainy Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rainy Mild High Strong No A. k-Nearest Neighbors. 1. Design a simple distance metric for this space. Briefly justify your choice. 2. Perform 7-fold cross-validation with k=3 for this dataset. Show your work (there should be 7 iterations with 7 corresponding predictions/accuracies on the held-out folds, and a final result). B. Naive Bayes. 1. Using the full dataset, induce the corresponding NB model. Show your result in the form of probability tables as we did in class. 2. What would your model predict for the following two Saturday mornings: <Oct 1, Overcast, Cool, High, Weak>, and <May 26, Sunny, Hot, Normal, Strong>? Show your work. Overview You are to construct a decision tree for the given data set using the ID3 decision tree algorithm in the Weka toolset. The dependent variable for the tree is ‘play’. Once you have constructed the tree use the tree and the data Weka provides about the tree to answer the questions below. Questions 1. Does the tree adequately describe the data? Why? Why not? 2. Use the decision tree to figure out the value for the dependent variable for the following instance: outlook rainy temperature hot humidity high windy false play ? Justify your answer! Instructions Start the Weka explorer. Load the file using the ‘preprocess’ tab. Switch tabs to ‘classify’ and select the ID3 algorithm with the ‘choose’ button (you will find ID3 under trees). Set the ‘test options’ to ‘use training set’. Make sure that ‘play’ is the attribute that shows below the test options. Hit the ‘start’ button – the big pane should display information about the decision tree built.