4.4 (a) 0.1 (b) 0.01 (c) 0.1^100 (d) Since the curse of dimensionality, everything is far from everything in high dimension. As a result there are very few training data near any given test data. (e) When p=1, len=0.1 When p=2, len=0.1^0.5=0.316 When p=100, len=0.1^0.01=0.977 4.8 For k=1, the training error for KNN is zero, since for any training data, its nearest neighbor is itself. We know that the average error for KNN is 18%, hence the test error is 36%, which is higher than the logistic regression test error 30%. Consequently, we choose logistic regression. 4.10 (a) Compute correlation matrix. Year and Volume appear to be strongly correlated. (b) Intercept and lag2 are statistically significant. (c) According to the confusion matrix, when market goes down, logistic regression tends to make the wrong prediction most of the time. (d) 62.5% (e) 62.5% (f) 58.65% (g) 50% (h) Logistic regression and LDA provide the best results on this data. (i) Answer may vary. 5.3 (a) As it is explained at page 181 of the textbook, k-fold cross validation involves randomly dividing the set of observations into k groups, or folds, of approximately equal size. The first fold is treated as a validation set, and the method is fit on the remaining k − 1 folds. The mean squared error, MSE1, is then computed on the observations in the held-out fold. This procedure is repeated k times; each time, a different group of observations is treated as a validation set. This process results in k estimates of the test error, MSE1,MSE2,...,MSEk . The k-fold CV estimate is computed by averaging these values. (b) i) Advantage of k-fold cross validation relative to the validation set: As it is explained at page 178, the validation estimate of the test error rate can be highly variable, depending on precisely which observations are included in the training set and which observations are included in the validation set. Moreover, validation set error rate may tend to overestimate the test error rate for the model fit on the entire data set. Disadvantage of k-fold cross validation relative to the validation set: As it is explained at page 177, validation set approach is conceptually simple and easy to implement. ii) As it is said at page 181, LOOCV is a special case of k-fold CV with k = n. Advantage of k-fold cross validation relative to LOOCV: LOOCV requires fitting the statistical learning method n times. This has the potential to be computationally expensive. Moreover, as explained at page 183, k-fold CV often gives more accurate estimates of the test error rate than does LOOCV. Disadvantage of k-fold cross validation relative to LOOCV: If the main purpose bias reduction, LOOCV should be preffered to k-fold CV since it tends to has less bias. 5.4 We can use bootstrap approach. The bootstrap approach works by repeatedly sampling observations from the original data set B times, for some large value of B, each time fitting a new model and subsequently obtaining the RMSE of the estimates for all B models. 5.5 (a)-(c) Answer may vary. (d) Doesn’t lead to a reduction in the test error rate.