Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. Which of the following are differences between LDA and Logistic Regression? (a) Logistic Regression is typically suited for binary classification, whereas LDA is directly applicable to multi-class problems (b) Logistic Regression is robust to outliers whereas LDA is sensitive to outliers (c) both (a) and (b) (d) None of these Sol. (c) Logistic regression uses the sigmoid function, and the output values are between 0 and 1, so it is typically suited for binary classification. LDA can be used when there are multiple classes present. In Logistic Regression, the effect of outliers is dampened because of sigmoid transformation. In LDA, the objective function is based on the distance, which can change drastically if outliers are present. 2. We have two classes in our dataset. The two classes have the same mean but different variance. (a) LDA can classify them perfectly. (b) LDA can NOT classify them perfectly. (c) LDA is not applicable in data with these properties (d) Insufficient information Sol. (b) If the classes have the same mean, they will not be linearly separable. 3. We have two classes in our dataset. The two classes have the same variance but different mean. (a) LDA can classify them perfectly. (b) LDA can NOT classify them perfectly. (c) LDA is not applicable in data with these properties (d) Insufficient information Sol. (d) Depending on the actual values of the mean and variance, the two classes may or may not be linearly separable. 4. Given the following distribution of data points: 1 What method would you choose to perform Dimensionality Reduction? (a) Linear Discriminant Analysis (b) Principal Component Analysis (c) Both LDA and/or PCA. (d) None of the above. Sol. (a) PCA does not use class labels and will treat all the points as instances of the same pool. Thus the principal component will be the vertical axis, as the most variance is along that direction. However, projecting all the points into the vertical axis will mean that critical information is lost, and both classes are mixed completely. LDA, on the other hand, models each class with a gaussian. This will lead to a principal component along the horizontal axis that retains class information (the classes are still linearly separable) 5. If log( 1 − p(x) ) = β0 + βx 1 + p(x) What is p(x)? (a) p(x) = 1+eβ0 +βx eβ0 +βx (b) p(x) = 1+eβ0 +βx 1−eβ0 +βx 2 (c) p(x) = eβ0 +βx 1+eβ0 +βx (d) p(x) = 1−eβ0 +βx 1+eβ0 +βx Sol. (d) log( 1 − p(x) ) = β0 + βx 1 + p(x) 1 − p(x) = eβ0 +βx 1 + p(x) 1 − p(x) = eβ0 +βx + p(x).eβ0 +βx 1 − eβ0 +βx = p(x)(1 + eβ0 +βx ) p(x) = 1 − eβ0 +βx 1 + eβ0 +βx 6. For the two classes ’+’ and ’-’ shown below. While performing LDA on it, which line is the most appropriate for projecting data points? (a) Red (b) Orange (c) Blue (d) Green Sol. (c) The blue line is parallel to the line joining the mean of the clusters and will therefore maximize the distance between the means. 3 7. Which of these techniques do we use to optimise Logistic Regression: (a) Least Square Error (b) Maximum Likelihood (c) (a) or (b) are equally good (d) (a) and (b) perform very poorly, so we generally avoid using Logistic Regression (e) None of these Sol. (b) Refer the lecture. 8. LDA assumes that the class data is distributed as: (a) Poisson (b) Uniform (c) Gaussian (d) LDA makes no such assumption. Sol. (c) Refer to lecture. 9. Suppose we have two variables, X and Y (the dependent variable), and we wish to find their relation. An expert tells us that relation between the two has the form Y = meX + c. Suppose the samples of the variables X and Y are available to us. Is it possible to apply linear regression to this data to estimate the values of m and c? (a) No. (b) Yes. (c) Insufficient information. (d) None of the above. Sol. (b) Note that linear regression can estimate (m+c) but not m and c separately. Instead of considering the dependent variable directly, we can transform the independent variable by considering the exponent of each value. Thus, on the X-axis, we can plot values of eX , and on the Y-axis, we can plot values of Y . Since, the relation between the dependent and the transformed independent variable is linear, the value of slope and intercept can be estimated using linear regression. 10. What might happen to our logistic regression model if the number of features is more than the number of samples in our dataset? (a) It will remain unaffected (b) It will not find a hyperplane as the decision boundary (c) It will overfit (d) None of the above Sol. (c) Refer to the lecture. 4