QuizCS273Aw11

advertisement
Quiz 1, ICS 273A Intro-ML
1. Bob is trying to predict stock-market prices. He downloads a dataset of two
months worth of price values of some random stock. He decides to learn a classifier
that predicts if the price of the stock will go up (+1) or down (-1) the next day. From
this training data he extracts a labeled dataset and train his classifier. Bob’s
classifier achieves 55% accuracy, where 50% can be achieved with random
guessing. Next, he will test his classifier on a test set. Which of the following
statements is true:
A) The classifier is barely performing better than random so it must be badly
overfitting.
B) The classifier is barely performing better than random so it must be badly
underfitting.
C) We cannot conclude whether they are under or overfitting from looking at the
training data alone.
2. Consider two nearest neighbor classifiers based on “k” neighbors (k-NN) and “p”
neighbors (p-NN) respectively. Assume p>k. On any given training set, which of the
two classifiers will have a smoother decision boundary? (i.e. which of the two
classifier will be less prone to overfitting)?
A) k-NN
B) p-NN
C) There is no difference.
3. Which of the following statements is True (choose only one).
A) The “Perceptron” is a nonparametric classifier.
B) The “Perceptron” is a parametric classifier.
C) In semi-supervised learning we use both labeled and unlabeled data for training
classifiers.
D) Both A) and C) are correct.
E) Both B) and C) are correct.
4. Consider a regression problem with N datacases. Assume we use a quadratic
polynomial to fit through the data. We measure estimation bias and variance. Does
the estimation variance increase or decrease when we increase the dataset size?
A) Increase
B) Decrease
5. Same question as above but now we increase the order of the polynomial we from
quadratic to cubic.
A) Increase
B) Decrease
6. Assume we have trained a Naïve Bayes classifier and assume we are classifying a
large collections of test examples by comparing the posterior distribution p(y=0|x)
with p(y=1|x) and pick the class with the highest posterior probability. We then
draw an ROC curve to show how well our algorithm worked. What will happen to
the ROC when we change the prior p(y=1) from 0.5 to 0.9 (and thus p(y=0)=0.1).
A) It will change: it will either get better or worse depending on the distribution of
class labels in the test set.
B) It will not change.
Download