Multiple Choice Questions Name of Faculty: Dr. Jayadevan R. Name of Subject: Machine Learning Year: BE Branch: Computer A-Batch U.N. = Unit Number (Syllabus) D. L. = Difficulty Level (Easy (E), Medium (M), Hard (H)) B.T.L. = Bloom’s Taxonomy Level (1, 2, 3, 4, 5, 6) Q. No. 1 2 3 4 5 6 7 Description Choice Assume that there are 100 e-mails; 25 spam mails and 75 non-spam mails. 20 spam mails contain the word ‘buy’ and 5 non-spam mails also contain the word ‘buy’. If an e-mail contains the word ‘buy’, what is the likelihood that it is a spam? Assume that there are 120 e-mails; 40 spam mails and 80 non-spam mails. 15 spam mails contain the word ‘cheap’ and 10 non-spam mails also contain the word ‘cheap’. If an e-mail contains the word ‘cheap’, what is the likelihood that it is a non-spam? Assume that there are 90 e-mails; 20 spam mails and 70 non-spam mails. 15 spam mails contain the word ‘buy’ and 10 of those 15 also contain the word ‘cheap’. 5 non-spam mails contain the word ‘buy’ and another 10 non-spam mails contain the word ‘cheap’. If an email contains the words ‘buy’ and ‘cheap’, what is the likelihood that it is a spam? Assume that there are 100 e-mails. 20 e-mails contain the word ‘buy’ and 30 e-mails contain the word ‘cheap’. Make a naïve assumption that the two words are independent of each other. How many e-mails contain both the words? Assume that there are 100 e-mails. 20 are spam mails and 80 are non-spam. 15 spam mails contain the word ‘buy’ and 12 spam mails contain the word ‘cheap’. 10 non-spam mails contain the word ‘buy’ and 8 non-spam mails contain the word ‘cheap’. Make a naïve assumption that the two words are independent of each other. If an e-mail contains both the words, what is the likelihood that it is a spam? Assume that there are 100 e-mails; 25 spam mails and 75 non-spam mails. The number of spam and non-spam mails containing three words ‘buy’, ‘cheap’ and ‘work’ are given in the following table. Spam Non-spam Buy 20 5 Cheap 15 10 Work 5 30 If an e-mail contains all the 3 words, what is the likelihood that it is a spam? ( Make a naïve assumption that the words are independent of each other) The Naive Bayes classifier works on the principle of: a. 20% b. 25% c. 75% d. 80% a. 20% b. 40% c. 60% d. 80% a. 100% b. 90% c. 80% d. 70% U. D. B.T. N. L. L. IV M 3 IV M 3 IV M 3 a. 0 b. 5 c. 6 d. 10 a. 80% b. 90% c. 95% d. 100% IV M 3 IV M 3 a. 70% b. 80% c. 90% d. 100% IV M 3 a. Correlation b. Conditional probability c. Bayes theorem d. Both (b) and (c) are correct IV E 1 8 The Naive Bayes algorithm falls under which category? 9 Which of the following is the most popular application of Naive Bayes classifier? 10 Why is Naive Bayes Classifier naive? 11 When you have continuous feature values, which type of Naive Bayes (NB) model will you use? 12 Which of the option is a disadvantage of Naive Bayes classifier? 13 Which of the option is an advantage of Naive Bayes classifier? a. Regression b. Supervised learning c. Clustering d. Optimization a. Text classification b. Face recognition c. Pattern classification d. Function approximation a. Assumes that classes are dependent. b. Assumes that the features of a class are dependent. c. Assumes that classes are independent. d. Assumes that the features of a class are independent. a. Gaussian NB b. Bernoulli NB c. Multinomial NB d. All of the above a. Can’t learn the relationship among the features b. Continuous feature data is assumed to be normally distributed. c. Both (a) and (b). d. None of the above. a. Can successfully train on small data set b. Good for multiclass classification c. Quick and simple since it is naïve IV E 1 IV E 2 IV E 1 IV E 2 IV E 2 IV E 2 d. All of the above. 14 15 16 Consider the data given in the following table. Yellow Sweet Long Fruit-A 350 450 50 Fruit-B 400 300 350 Fruit-C 50 100 50 Assume that the total available quantity of Fruit-A is 650, Fruit-B is 450 and Fruit-C is 150. Apply Naïve Bayes Algorithm to predict the type (class) of a fruit which is Yellow, Sweet as well as Long. Using the data given in Q.14, Apply Naïve Bayes Algorithm to predict the type (class) of a fruit which is short and sour. The table given below shows the data available with a second hand car dealer. Colour Type Origin Condition Red Car Domestic Good Red Car Domestic Bad Red Car Domestic Good White Car Domestic Bad White Car Imported Good White SUV Imported Bad White SUV Imported Good White SUV Domestic Bad Red SUV Imported Bad Red Car Imported Good IV M 3 IV M 3 IV M 3 IV M 3 IV M 3 IV E 2 IV M 2 a. Fruit-A b. Fruit-B c. Fruit-C d. All of the above a. Fruit-A b. Fruit-B c. Fruit-C d. All of the above a. Good b. Bad c. Equal probability d. Data not sufficient Using Naïve Bayes algorithm, predict the condition of a vehicle with the following properties. {Red, Domestic, SUV} 17 In Q.16, what is the value of the posterior probability corresponding to the condition ‘Good’? 18 In Q.16, what is the value of the posterior probability corresponding to the condition ‘Bad’? 19 Which Naïve Bayes (NB) model is commonly used for text/ document classification? 20 Check whether the following statements are true/false with respect to Naïve Bayes Classifier. 1. Very simple and easy to implement. 2. Needs less training data. 3. Handles both continuous and discrete data. 4. Highly scalable with number of features and data points. 5. It can be used in real time predictions. 6. Not sensitive to irrelevant features. a. 0.50 b. 0.48 c. 0.24 d. 0.72 a. 0.50 b. 0.48 c. 0.24 d. 0.72 a. Gaussian NB b. Bernoulli NB c. Multinomial NB d. All of the above a. Only the first 4 statements are true. b. Only the first 5 statements are true. c. All the statements are false. 21 What is the dimension of a hyperplane in a p-dimensional space? 22 The effectiveness of an SVM depends upon: 23 24 Support vectors are the data points that lie closest to the decision surface. Which type of datasets are not suited for SVMs’? 25 The SVM’s are less effective when: 26 Suppose that you are using RBF kernel in SVM with high Gamma value. What does this signify? d. All the statements are true. a. p b. p−1 c. p+1 d. p−2 a. Selection of Kernel b. Kernel Parameters c. Soft Margin Parameter d. All of the above a. True b. False a. Small datasets b. Medium sized datasets c. Large datasets d. Size doesn’t matter a. The data is linearly separable b. The data is noisy and contains overlapping points c. The data is clean and ready to use d. All of the above a. The model would not be affected by distance of points from hyperplane for modelling. b. The model would consider only the points close to the hyperplane for modelling c. The model would consider even far away points from hyperplane for modelling IV E 1 IV E 2 IV E 1 IV E 1 IV E 2 IV M 2 27 The cost parameter in the SVM means: 28 If you achieve 100% accuracy on the training set, but only 70% on the validation set, what should you look out for? 29 Suppose that you have trained an SVM with a linear decision boundary. After training the SVM, you find that your SVM model is under fitting. Which option will you consider for the next iteration? 30 What is supposed to be done in terms of bias and variance for the situation mentioned in the Q. 29? 31 Suppose that you are dealing with 4 class classification problem and you want to train an SVM model on the data. Assume that you are using one-vs-all method. How many times do you need to train the SVM model in such a case? Assume that there are only 2 classes. How many times do you need to train the SVM model in such a case? 32 33 Suppose that you are using an SVM with linear kernel polynomial of degree 2. Now think that you have applied this on data and found that both training and testing accuracy is 100%. Assume that you increase the complexity (or degree of polynomial of this kernel). What will happen? d. Both (b) and (c). a. The number of crossvalidations to be made b. The kernel to be used c. The trade-off between misclassificatio n and simplicity of the model d. Both (a) and (b). a. Overfitting b. Underfitting c. Model is perfect d. Testing a. Increase data points b. Decrease data points c. Decrease features d. Increase features. a. Increase both bias and variance b. Reduce both bias and variance c. Reduce the bias and increase the variance. d. Increase the bias and reduce the variance. a. 1 b. 2 c. 3 d. 4 a. 1 b. 2 c. 3 d. 4 a. Underfitting b. Overfitting c. Nothing will happen d. None of the above. IV E 1 IV E 2 IV E 2 IV E 2 IV E 2 IV E 2 IV E 2 34 What is/are true about kernel in SVM? 1. Kernel function map low dimensional data to high dimensional space 2. It is a similarity function 35 A large value for the C-parameter in SVM will result in: 36 What do you mean by the ‘margin’ of a hyperplane? 37 Which of the following statements are true with respect to linear SVM. 1. An SVM which is used to classify data which are linearly separable is called a linear SVM. 2. A linear SVM searches for a hyperplane with the maximum margin. 3. A linear SVM is often termed as a maximal margin classifier. The complexity of a linear SVM classifier is characterized by: 38 39 Check whether the data belonging to the two classes (+) and (−) in the graphs are linearly separable? 40 Which type of SVM classifiers are suitable for the classification of data points shown in the second graph of Q.39? a. Only 1 is true b. Only 2 is true c. Both 1 and 2 are true d. Both 1 and 2 are false a. Larger margin hyperplane b. Smaller margin hyperplane c. No hyperplane d. Will not have any impact. a. Length of the hyperplane b. Height of the hyper plane c. Distance between the decision boundaries d. Distance between the decision boundary and origin. a. Only 1 is true b. Only 1 and 2 are true. c. All are false d. All are true IV M 1 IV E 2 IV E 1 IV M 2 a. Number of support vectors b. Test data c. Number of features d. All the above a. Both (1) and (2) are linearly separable. b. Only (1) is linearly separable c. Only (2) is linearly separable d. Both (1) and (2) are not linearly separable. a. Non-linear SVMs IV E 1 IV E 2 IV E 2 41 What is the purpose of the Kernel Trick in SVM? 42 Which of the following hyperplanes will be selected by an SVM? 43 Which of these points is a support vector to the hyperplane? 44 Support vector machines penalize a training data point for being on the wrong side of the decision boundary and beyond the margin. You trained a binary classifier model which gives very high accuracy on the training data, but much lower accuracy on validation data. Which of the following statement(s) is (are) true? 1. This is an instance of overfitting. 45 b. Soft margin SVMs c. Both (a) and (b) d. None of the above. a. To transform the problem from nonlinear to linear b. To transform the problem from regression to classification c. To transform the data from nonlinearly separable to linearly separable d. To transform the problem from supervised to unsupervised learning. a. (a) b. (b) c. (c) d. All IV E 1 IV E 2 a. a b. b c. c d. None of the above IV E 1 a. True b. False a. Only 1 and 4 are true b. Only 1, 2 and 3 are true IV E 1 IV M 2 2. This is an instance of underfitting. 3. The training was not well regularized. 4. The training and validating examples are sampled from different distributions 46 47 48 49 50 Three different classifiers are trained on the same data. Their decision boundaries are shown above. Which of the following statements are true? 1. The leftmost classifier has high robustness, poor fit. 2. The leftmost classifier has poor robustness, high fit. 3. The rightmost classifier has poor robustness, high fit. 4. The rightmost classifier has high robustness, poor fit. Which of the options can only be used when training data are linearly-separable? Which of the following might be valid reasons for preferring an SVM over a neural network? 1. An SVM can automatically learn to apply a non-linear transformation on the input space; a neural net cannot. 2. An SVM can effectively map the data to an infinite-dimensional space; a neural net cannot. 3. An SVM should not get stuck in local minima, unlike a neural net. 4. The transformed representation constructed by an SVM is usually easier to interpret than for a neural net. You are given a labelled binary classification data set with N data points and D features. Suppose that N < D. In training an SVM on this data set, which of the options is likely to be most appropriate? Assume that you are training an RBF SVM with the following parameters C (slack penalty) and γ (gamma). How should you tweak the parameters to reduce overfitting? c. Only 1, 3 and 4 are true. d. Only 1 and 3 are true a. Only 1 and 3 are true b. Only 2 and 4 are true c. Only 1 and 4 are true d. Only 2 and 3 are true IV M 2 a. Nonlinear SVM b. Linear hard margin SVM c. Linear soft margin SVM d. All the above a. Only 1 and 2 b. Only 2 and 3 c. Only 3 and 4 d. All are valid IV E 1 IV M 2 a. Linear kernel b. Quadratic kernel c. Higher-order polynomial kernel d. RBF kernel a. Increase C and/or reduce γ b. Reduce C and/or increase γ c. Reduce C and/or reduce γ d. Reduce C only IV M 2 IV M 2