Uploaded by npnew7571

VL2020210105623 DA

advertisement
Fall Semester 2020-2021
Data Mining Techniques Assignment
Consider the following simple dataset. The target attribute is PlayTennis, which has
values Yes or No for different Saturday mornings, depending on several other attributes
of those mornings.
Day
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Outlook Temperature Humidity Wind PlayTennis
Sunny
Hot
High
Weak
No
Sunny
Hot
High Strong
No
Overcast
Hot
High
Weak
Yes
Rainy
Mild
High
Weak
Yes
Rainy
Cool
Normal Weak
Yes
Rainy
Cool
Normal Strong
No
Overcast
Cool
Normal Strong
Yes
Sunny
Mild
High
Weak
No
Sunny
Cool
Normal Weak
Yes
Rainy
Mild
Normal Weak
Yes
Sunny
Mild
Normal Strong
Yes
Overcast
Mild
High Strong
Yes
Overcast
Hot
Normal Weak
Yes
Rainy
Mild
High Strong
No
A. k-Nearest Neighbors.
1. Design a simple distance metric for this space. Briefly justify your choice.
2. Perform 7-fold cross-validation with k=3 for this dataset. Show your work (there
should be 7 iterations with 7 corresponding predictions/accuracies on the held-out
folds, and a final result).
B. Naive Bayes.
1. Using the full dataset, induce the corresponding NB model. Show your result in
the form of probability tables as we did in class.
2. What would your model predict for the following two Saturday mornings: <Oct 1,
Overcast, Cool, High, Weak>, and <May 26, Sunny, Hot, Normal, Strong>?
Show your work.
Overview
You are to construct a decision tree for the given data set using the ID3 decision tree
algorithm in the Weka toolset. The dependent variable for the tree is ‘play’. Once you
have constructed the tree use the tree and the data Weka provides about the tree to answer
the questions below.
Questions
1. Does the tree adequately describe the data? Why? Why not?
2. Use the decision tree to figure out the value for the dependent variable for the
following instance:
outlook
rainy
temperature
hot
humidity
high
windy
false
play
?
Justify your answer!
Instructions






Start the Weka explorer.
Load the file using the ‘preprocess’ tab.
Switch tabs to ‘classify’ and select the ID3 algorithm with the ‘choose’ button
(you will find ID3 under trees).
Set the ‘test options’ to ‘use training set’.
Make sure that ‘play’ is the attribute that shows below the test options.
Hit the ‘start’ button – the big pane should display information about the decision
tree built.
Download