Uploaded by Waqas Ahmad

ML HW

advertisement
1. Suppose you have given the following 2-class data where “+” represents a positive class and “−”
is represent negative class.
Which of the following would be the leave one out cross validation error for k=5?
A) 2/14
B) 4/14
C) 6/14
D) 8/14
E) None of the above
Please select the correct answer and explain your work with words and/or diagram.
Answer:
B) For k=5,
Cross validation error is 4/14
Which of the following value of k in k-NN would minimize the leave one out cross validation
error?
A) 3
B) 5
C) Both have same
D) None of these
Please select the correct answer and explain your work with words and/or diagram
Answer:
B) k=5, or 5-NN will have the least LOO cross validation error.
Which of the following will be true about k in k-NN in terms of Bias?
A) When you increase the k the bias will be increases
B) When you decrease the k the bias will be increases
C) Can’t say
D) None of these
Please select the correct answer and explain your work with words and/or diagram.
Answer:
A) When we increase k, bias increases as the model becomes simple, lower k decreases bias.
Which of the following will be true about k in k-NN in terms of variance?
A) When you increase the k the variance will increases
B) When you decrease the k the variance will increases
C) Can’t say
D) None of these
Please select the correct answer and explain your work with words and/or diagram.
Answer:
B) When we decrease k, variance increases.
Why might using too large values k be bad in this dataset? Why might too small values of k also
be bad? Please answer and explain with words and/or diagram.
Answer:
When we increase k too large, that will start generalizing the and include point that belong to
more than one class causing to produce errors. And when we start reducing k to too low,
there is not much room for error, any small change in the dataset can lead to produce errors.
2. Suppose, you have given the following data where x and y are the 2 input variables and Class is
the dependent variable.
Below is a scatter plot which shows the above data in 2D space.
Suppose, you want to predict the class of new data point x=1 and y=1 using eucludian distance
in 3-NN. In which class this data point belong to?
A) + Class
B) – Class
C) Can’t say
D) None of these
Answer:
A) 3 nearest points are + as shown in the figure above with red circle
In the previous question, you are now want use 7-NN instead of 3-KNN which of the following
x=1 and y=1 will belong to?
A) + Class
B) – Class
C) Can’t say
Answer:
B) – Class 4 out of 7 nearest points are –, as shown in the figure with blue circle
3. Draw a picture of the false positive rate as a fraction as was done for precision and recall in
class:
FPR = FP/FP+TN = 3/10
Precision = TP / TP + FP = 5/8
Recall = TP / TP + FN = 5/12
Download