Uploaded by hymlin rose

Assignment 3

advertisement
Assignment 3
Introduction to Machine Learning
Prof. B. Ravindran
1. Which of the following are differences between LDA and Logistic Regression?
(a) Logistic Regression is typically suited for binary classification, whereas LDA is directly
applicable to multi-class problems
(b) Logistic Regression is robust to outliers whereas LDA is sensitive to outliers
(c) both (a) and (b)
(d) None of these
Sol. (c)
Logistic regression uses the sigmoid function, and the output values are between 0 and 1, so it
is typically suited for binary classification. LDA can be used when there are multiple classes
present.
In Logistic Regression, the effect of outliers is dampened because of sigmoid transformation. In
LDA, the objective function is based on the distance, which can change drastically if outliers
are present.
2. We have two classes in our dataset. The two classes have the same mean but different
variance.
(a) LDA can classify them perfectly.
(b) LDA can NOT classify them perfectly.
(c) LDA is not applicable in data with these properties
(d) Insufficient information
Sol. (b) If the classes have the same mean, they will not be linearly separable.
3. We have two classes in our dataset. The two classes have the same variance but different
mean.
(a) LDA can classify them perfectly.
(b) LDA can NOT classify them perfectly.
(c) LDA is not applicable in data with these properties
(d) Insufficient information
Sol. (d) Depending on the actual values of the mean and variance, the two classes may or
may not be linearly separable.
4. Given the following distribution of data points:
1
What method would you choose to perform Dimensionality Reduction?
(a) Linear Discriminant Analysis
(b) Principal Component Analysis
(c) Both LDA and/or PCA.
(d) None of the above.
Sol. (a)
PCA does not use class labels and will treat all the points as instances of the same pool. Thus
the principal component will be the vertical axis, as the most variance is along that direction.
However, projecting all the points into the vertical axis will mean that critical information is
lost, and both classes are mixed completely. LDA, on the other hand, models each class with
a gaussian. This will lead to a principal component along the horizontal axis that retains class
information (the classes are still linearly separable)
5. If
log(
1 − p(x)
) = β0 + βx
1 + p(x)
What is p(x)?
(a) p(x) =
1+eβ0 +βx
eβ0 +βx
(b) p(x) =
1+eβ0 +βx
1−eβ0 +βx
2
(c) p(x) =
eβ0 +βx
1+eβ0 +βx
(d) p(x) =
1−eβ0 +βx
1+eβ0 +βx
Sol. (d)
log(
1 − p(x)
) = β0 + βx
1 + p(x)
1 − p(x)
= eβ0 +βx
1 + p(x)
1 − p(x) = eβ0 +βx + p(x).eβ0 +βx
1 − eβ0 +βx = p(x)(1 + eβ0 +βx )
p(x) =
1 − eβ0 +βx
1 + eβ0 +βx
6. For the two classes ’+’ and ’-’ shown below.
While performing LDA on it, which line is the most appropriate for projecting data points?
(a) Red
(b) Orange
(c) Blue
(d) Green
Sol. (c)
The blue line is parallel to the line joining the mean of the clusters and will therefore maximize
the distance between the means.
3
7. Which of these techniques do we use to optimise Logistic Regression:
(a) Least Square Error
(b) Maximum Likelihood
(c) (a) or (b) are equally good
(d) (a) and (b) perform very poorly, so we generally avoid using Logistic Regression
(e) None of these
Sol. (b) Refer the lecture.
8. LDA assumes that the class data is distributed as:
(a) Poisson
(b) Uniform
(c) Gaussian
(d) LDA makes no such assumption.
Sol. (c)
Refer to lecture.
9. Suppose we have two variables, X and Y (the dependent variable), and we wish to find their
relation. An expert tells us that relation between the two has the form Y = meX + c. Suppose
the samples of the variables X and Y are available to us. Is it possible to apply linear regression
to this data to estimate the values of m and c?
(a) No.
(b) Yes.
(c) Insufficient information.
(d) None of the above.
Sol. (b)
Note that linear regression can estimate (m+c) but not m and c separately.
Instead of considering the dependent variable directly, we can transform the independent
variable by considering the exponent of each value. Thus, on the X-axis, we can plot values
of eX , and on the Y-axis, we can plot values of Y . Since, the relation between the dependent
and the transformed independent variable is linear, the value of slope and intercept can be
estimated using linear regression.
10. What might happen to our logistic regression model if the number of features is more than
the number of samples in our dataset?
(a) It will remain unaffected
(b) It will not find a hyperplane as the decision boundary
(c) It will overfit
(d) None of the above
Sol. (c)
Refer to the lecture.
4
Download