Quiz 1, ICS 273A Intro-ML 1. Bob is trying to predict stock-market prices. He downloads a dataset of two months worth of price values of some random stock. He decides to learn a classifier that predicts if the price of the stock will go up (+1) or down (-1) the next day. From this training data he extracts a labeled dataset and train his classifier. Bob’s classifier achieves 55% accuracy, where 50% can be achieved with random guessing. Next, he will test his classifier on a test set. Which of the following statements is true: A) The classifier is barely performing better than random so it must be badly overfitting. B) The classifier is barely performing better than random so it must be badly underfitting. C) We cannot conclude whether they are under or overfitting from looking at the training data alone. 2. Consider two nearest neighbor classifiers based on “k” neighbors (k-NN) and “p” neighbors (p-NN) respectively. Assume p>k. On any given training set, which of the two classifiers will have a smoother decision boundary? (i.e. which of the two classifier will be less prone to overfitting)? A) k-NN B) p-NN C) There is no difference. 3. Which of the following statements is True (choose only one). A) The “Perceptron” is a nonparametric classifier. B) The “Perceptron” is a parametric classifier. C) In semi-supervised learning we use both labeled and unlabeled data for training classifiers. D) Both A) and C) are correct. E) Both B) and C) are correct. 4. Consider a regression problem with N datacases. Assume we use a quadratic polynomial to fit through the data. We measure estimation bias and variance. Does the estimation variance increase or decrease when we increase the dataset size? A) Increase B) Decrease 5. Same question as above but now we increase the order of the polynomial we from quadratic to cubic. A) Increase B) Decrease 6. Assume we have trained a Naïve Bayes classifier and assume we are classifying a large collections of test examples by comparing the posterior distribution p(y=0|x) with p(y=1|x) and pick the class with the highest posterior probability. We then draw an ROC curve to show how well our algorithm worked. What will happen to the ROC when we change the prior p(y=1) from 0.5 to 0.9 (and thus p(y=0)=0.1). A) It will change: it will either get better or worse depending on the distribution of class labels in the test set. B) It will not change.