Uploaded by deku kun

ML-Unit-4-MCQ-BE-Comp-A

advertisement
Multiple Choice Questions
Name of Faculty: Dr. Jayadevan R.
Name of Subject: Machine Learning
Year: BE
Branch: Computer A-Batch
U.N. = Unit Number (Syllabus)
D. L. = Difficulty Level (Easy (E), Medium (M), Hard (H))
B.T.L. = Bloom’s Taxonomy Level (1, 2, 3, 4, 5, 6)
Q.
No.
1
2
3
4
5
6
7
Description
Choice
Assume that there are 100 e-mails; 25 spam mails and 75 non-spam
mails. 20 spam mails contain the word ‘buy’ and 5 non-spam mails
also contain the word ‘buy’. If an e-mail contains the word ‘buy’,
what is the likelihood that it is a spam?
Assume that there are 120 e-mails; 40 spam mails and 80 non-spam
mails. 15 spam mails contain the word ‘cheap’ and 10 non-spam
mails also contain the word ‘cheap’. If an e-mail contains the word
‘cheap’, what is the likelihood that it is a non-spam?
Assume that there are 90 e-mails; 20 spam mails and 70 non-spam
mails. 15 spam mails contain the word ‘buy’ and 10 of those 15 also
contain the word ‘cheap’. 5 non-spam mails contain the word ‘buy’
and another 10 non-spam mails contain the word ‘cheap’. If an email contains the words ‘buy’ and ‘cheap’, what is the likelihood that
it is a spam?
Assume that there are 100 e-mails. 20 e-mails contain the word
‘buy’ and 30 e-mails contain the word ‘cheap’. Make a naïve
assumption that the two words are independent of each other. How
many e-mails contain both the words?
Assume that there are 100 e-mails. 20 are spam mails and 80 are
non-spam. 15 spam mails contain the word ‘buy’ and 12 spam mails
contain the word ‘cheap’. 10 non-spam mails contain the word ‘buy’
and 8 non-spam mails contain the word ‘cheap’. Make a naïve
assumption that the two words are independent of each other. If an
e-mail contains both the words, what is the likelihood that it is a
spam?
Assume that there are 100 e-mails; 25 spam mails and 75 non-spam
mails. The number of spam and non-spam mails containing three
words ‘buy’, ‘cheap’ and ‘work’ are given in the following table.
Spam Non-spam
Buy
20
5
Cheap 15
10
Work 5
30
If an e-mail contains all the 3 words, what is the
likelihood that
it is a spam? ( Make a naïve assumption that the words are
independent of each other)
The Naive Bayes classifier works on the principle of:
a. 20%
b. 25%
c. 75%
d. 80%
a. 20%
b. 40%
c. 60%
d. 80%
a. 100%
b. 90%
c. 80%
d. 70%
U. D. B.T.
N. L. L.
IV M 3
IV M
3
IV M
3
a. 0
b. 5
c. 6
d. 10
a. 80%
b. 90%
c. 95%
d. 100%
IV M
3
IV M
3
a. 70%
b. 80%
c. 90%
d. 100%
IV M
3
a. Correlation
b. Conditional
probability
c. Bayes theorem
d. Both (b) and (c)
are correct
IV E
1
8
The Naive Bayes algorithm falls under which category?
9
Which of the following is the most popular application of Naive
Bayes classifier?
10
Why is Naive Bayes Classifier naive?
11
When you have continuous feature values, which type of Naive
Bayes (NB) model will you use?
12
Which of the option is a disadvantage of Naive Bayes classifier?
13
Which of the option is an advantage of Naive Bayes classifier?
a. Regression
b. Supervised
learning
c. Clustering
d. Optimization
a. Text
classification
b. Face recognition
c. Pattern
classification
d. Function
approximation
a. Assumes that
classes are
dependent.
b. Assumes that
the features of a
class are
dependent.
c. Assumes that
classes are
independent.
d. Assumes that
the features of a
class are
independent.
a. Gaussian NB
b. Bernoulli NB
c. Multinomial NB
d. All of the above
a. Can’t learn the
relationship
among the
features
b. Continuous
feature data is
assumed to be
normally
distributed.
c. Both (a) and
(b).
d. None of the
above.
a. Can successfully
train on small
data set
b. Good for
multiclass
classification
c. Quick and
simple since it is
naïve
IV E
1
IV E
2
IV E
1
IV E
2
IV E
2
IV E
2
d. All of the above.
14
15
16
Consider the data given in the following table.
Yellow
Sweet
Long
Fruit-A 350
450
50
Fruit-B 400
300
350
Fruit-C
50
100
50
Assume that the total available quantity of Fruit-A is 650, Fruit-B is
450 and Fruit-C is 150. Apply Naïve Bayes Algorithm to predict the
type (class) of a fruit which is Yellow, Sweet as well as Long.
Using the data given in Q.14, Apply Naïve Bayes Algorithm to predict
the type (class) of a fruit which is short and sour.
The table given below shows the data available with a second hand
car dealer.
Colour
Type
Origin
Condition
Red
Car
Domestic
Good
Red
Car
Domestic
Bad
Red
Car
Domestic
Good
White
Car
Domestic
Bad
White
Car
Imported
Good
White
SUV
Imported
Bad
White
SUV
Imported
Good
White
SUV
Domestic
Bad
Red
SUV
Imported
Bad
Red
Car
Imported
Good
IV M
3
IV M
3
IV M
3
IV M
3
IV M
3
IV E
2
IV M
2
a. Fruit-A
b. Fruit-B
c. Fruit-C
d. All of the above
a. Fruit-A
b. Fruit-B
c. Fruit-C
d. All of the above
a. Good
b. Bad
c. Equal
probability
d. Data not
sufficient
Using Naïve Bayes algorithm, predict the condition of a vehicle with
the following properties.
{Red, Domestic, SUV}
17
In Q.16, what is the value of the posterior probability corresponding
to the condition ‘Good’?
18
In Q.16, what is the value of the posterior probability corresponding
to the condition ‘Bad’?
19
Which Naïve Bayes (NB) model is commonly used for text/
document classification?
20
Check whether the following statements are true/false with respect
to Naïve Bayes Classifier.
1. Very simple and easy to implement.
2. Needs less training data.
3. Handles both continuous and discrete data.
4. Highly scalable with number of features and data points.
5. It can be used in real time predictions.
6. Not sensitive to irrelevant features.
a. 0.50
b. 0.48
c. 0.24
d. 0.72
a. 0.50
b. 0.48
c. 0.24
d. 0.72
a. Gaussian NB
b. Bernoulli NB
c. Multinomial NB
d. All of the above
a. Only the first 4
statements are
true.
b. Only the first 5
statements are
true.
c. All the
statements are
false.
21
What is the dimension of a hyperplane in a p-dimensional space?
22
The effectiveness of an SVM depends upon:
23
24
Support vectors are the data points that lie closest to the decision
surface.
Which type of datasets are not suited for SVMs’?
25
The SVM’s are less effective when:
26
Suppose that you are using RBF kernel in SVM with high Gamma
value. What does this signify?
d. All the
statements are
true.
a. p
b. p−1
c. p+1
d. p−2
a. Selection of
Kernel
b. Kernel
Parameters
c. Soft Margin
Parameter
d. All of the above
a. True
b. False
a. Small datasets
b. Medium sized
datasets
c. Large datasets
d. Size doesn’t
matter
a. The data is
linearly
separable
b. The data is
noisy and
contains
overlapping
points
c. The data is
clean and ready
to use
d. All of the above
a. The model
would not be
affected by
distance of
points from
hyperplane for
modelling.
b. The model
would consider
only the points
close to the
hyperplane for
modelling
c. The model
would consider
even far away
points from
hyperplane for
modelling
IV E
1
IV E
2
IV E
1
IV E
1
IV E
2
IV M
2
27
The cost parameter in the SVM means:
28
If you achieve 100% accuracy on the training set, but only 70% on
the validation set, what should you look out for?
29
Suppose that you have trained an SVM with a linear decision
boundary. After training the SVM, you find that your SVM model is
under fitting. Which option will you consider for the next iteration?
30
What is supposed to be done in terms of bias and variance for the
situation mentioned in the Q. 29?
31
Suppose that you are dealing with 4 class classification problem and
you want to train an SVM model on the data. Assume that you are
using one-vs-all method. How many times do you need to train the
SVM model in such a case?
Assume that there are only 2 classes. How many times do you need
to train the SVM model in such a case?
32
33
Suppose that you are using an SVM with linear kernel polynomial of
degree 2. Now think that you have applied this on data and found
that both training and testing accuracy is 100%. Assume that you
increase the complexity (or degree of polynomial of this kernel).
What will happen?
d. Both (b) and
(c).
a. The number of
crossvalidations to be
made
b. The kernel to be
used
c. The trade-off
between
misclassificatio
n and simplicity
of the model
d. Both (a) and
(b).
a. Overfitting
b. Underfitting
c. Model is perfect
d. Testing
a. Increase data
points
b. Decrease data
points
c. Decrease
features
d. Increase
features.
a. Increase both
bias and
variance
b. Reduce both
bias and
variance
c. Reduce the bias
and increase the
variance.
d. Increase the
bias and reduce
the variance.
a. 1
b. 2
c. 3
d. 4
a. 1
b. 2
c. 3
d. 4
a. Underfitting
b. Overfitting
c. Nothing will
happen
d. None of the
above.
IV E
1
IV E
2
IV E
2
IV E
2
IV E
2
IV E
2
IV E
2
34
What is/are true about kernel in SVM?
1. Kernel function map low dimensional data to high dimensional
space
2. It is a similarity function
35
A large value for the C-parameter in SVM will result in:
36
What do you mean by the ‘margin’ of a hyperplane?
37
Which of the following statements are true with respect to linear
SVM.
1. An SVM which is used to classify data which are linearly
separable is called a linear SVM.
2. A linear SVM searches for a hyperplane with the maximum
margin.
3. A linear SVM is often termed as a maximal margin classifier.
The complexity of a linear SVM classifier is characterized by:
38
39
Check whether the data belonging to the two classes (+) and (−) in
the graphs are linearly separable?
40
Which type of SVM classifiers are suitable for the classification of
data points shown in the second graph of Q.39?
a. Only 1 is true
b. Only 2 is true
c. Both 1 and 2 are
true
d. Both 1 and 2 are
false
a. Larger margin
hyperplane
b. Smaller margin
hyperplane
c. No hyperplane
d. Will not have
any impact.
a. Length of the
hyperplane
b. Height of the
hyper plane
c. Distance
between the
decision
boundaries
d. Distance
between the
decision
boundary and
origin.
a. Only 1 is true
b. Only 1 and 2 are
true.
c. All are false
d. All are true
IV M
1
IV E
2
IV E
1
IV M
2
a. Number of
support vectors
b. Test data
c. Number of
features
d. All the above
a. Both (1) and (2)
are linearly
separable.
b. Only (1) is
linearly
separable
c. Only (2) is
linearly
separable
d. Both (1) and (2)
are not linearly
separable.
a. Non-linear
SVMs
IV E
1
IV E
2
IV E
2
41
What is the purpose of the Kernel Trick in SVM?
42
Which of the following hyperplanes will be selected by an SVM?
43
Which of these points is a support vector to the hyperplane?
44
Support vector machines penalize a training data point for being on
the wrong side of the decision boundary and beyond the margin.
You trained a binary classifier model which gives very high accuracy
on the training data, but much lower accuracy on validation data.
Which of the following statement(s) is (are) true?
1. This is an instance of overfitting.
45
b. Soft margin
SVMs
c. Both (a) and (b)
d. None of the
above.
a. To transform
the problem
from nonlinear
to linear
b. To transform
the problem
from regression
to classification
c. To transform
the data from
nonlinearly
separable to
linearly
separable
d. To transform
the problem
from supervised
to unsupervised
learning.
a. (a)
b. (b)
c. (c)
d. All
IV E
1
IV E
2
a. a
b. b
c. c
d. None of the
above
IV E
1
a. True
b. False
a. Only 1 and 4 are
true
b. Only 1, 2 and 3
are true
IV E
1
IV M
2
2. This is an instance of underfitting.
3. The training was not well regularized.
4. The training and validating examples are sampled from different
distributions
46
47
48
49
50
Three different classifiers are trained on the same data. Their
decision boundaries are shown above. Which of the following
statements are true?
1. The leftmost classifier has high robustness, poor fit.
2. The leftmost classifier has poor robustness, high fit.
3. The rightmost classifier has poor robustness, high fit.
4. The rightmost classifier has high robustness, poor fit.
Which of the options can only be used when training data are
linearly-separable?
Which of the following might be valid reasons for preferring an SVM
over a neural network?
1. An SVM can automatically learn to apply a non-linear
transformation on the input space; a neural net cannot.
2. An SVM can effectively map the data to an infinite-dimensional
space; a neural net cannot.
3. An SVM should not get stuck in local minima, unlike a neural net.
4. The transformed representation constructed by an SVM is usually
easier to interpret than for a neural net.
You are given a labelled binary classification data set with N data
points and D features. Suppose that N < D. In training an SVM on
this data set, which of the options is likely to be most appropriate?
Assume that you are training an RBF SVM with the following
parameters C (slack penalty) and γ (gamma). How should you
tweak the parameters to reduce overfitting?
c. Only 1, 3 and 4
are true.
d. Only 1 and 3 are
true
a. Only 1 and 3 are
true
b. Only 2 and 4 are
true
c. Only 1 and 4 are
true
d. Only 2 and 3 are
true
IV M
2
a. Nonlinear SVM
b. Linear hard
margin SVM
c. Linear soft
margin SVM
d. All the above
a. Only 1 and 2
b. Only 2 and 3
c. Only 3 and 4
d. All are valid
IV E
1
IV M
2
a. Linear kernel
b. Quadratic
kernel
c. Higher-order
polynomial
kernel
d. RBF kernel
a. Increase C
and/or reduce γ
b. Reduce C
and/or increase
γ
c. Reduce C and/or
reduce γ
d. Reduce C only
IV M
2
IV M
2
Download