第頁共7頁機器學習期末考試學號：姓名： 102/6 是非題(24%) ( ) 1

第 1 頁共 7 頁機器學習期末考試學號：姓名： 102/6 一、是非題(24%) ( )1. Principal components analysis (PCA) and linear discriminant analysis (LDA) are both supervised dimensionality reduction method. ( )2. The k-means clustering algorithm is used to solve the supervised learning problem. ( )3. In nonparametric estimation, all we assume is that similar inputs have similar outputs. ( )4. Rule induction works similar to tree induction except that rule induction does a breadth-first search, whereas tree induction goes depth-first. ( )5. A decision tree is a hierarchical model using a divided-and-conquer strategy. ( )6. Expectation-Maximization (EM) algorithm can be used to solve the unsupervised learning problem. ( )7. Entropy in information theory specifies the maximum number of bits needed to encode the classification accuracy of an instance. ( )8. To remove subtrees in a decision tree, postpruning is faster and prepruning is more accurate. ( )9. In semiparametric estimation, the density is written as a disjunction of a small number of parametric models. ( )10. Gradient descent is a simple and global method. When online training is used, it does not need to store the training set and can adapt as the task to be learned changes. However, gradient descent converges slowly. ( )11. When classes are Gaussian with a shared covariance matrix, the optimal discriminant is linear. ( )12. Locally linear embedding method can recovers global nonlinear structure from locally linear fits. 二、簡答題 1. (4%) Can you explain what Isomap is? What is geodesic distance? 1 第 2 頁共 7 頁 2. (3%) What is the difference between feature selection methods and feature extraction methods? 3. (3%) Draw two-class, two dimensional data such that PCA and LDA find totally different directions. 4. (4%) Please explain the meanings of the nonparametric density estimation methods. What are its assumptions? 5. (4%) What are the differences between the parametric density estimation methods and the semiparametric density estimation methods? 6. (2%) In the running mean smoother, we can fit a constant, a line, or a higher-degree polynomial at a test point. How can we choose between them? 7. (2%) Please finish the following Expectation-maximixation (EM) algorithm. 8. (4%) Please finish the following k-means clustering algorithm. 2 第 3 頁共 7 頁 9. (2%) Condensed Nearest Neighbor algorithm is used to find a subset Z of X that is small and is accurate in classifying X. Please finish the following Condensed Nearest Neighbor algorithm. 10. (3%) Please show the properties that an impurity measure of a classification tree should be satisfied. 11. (3%) Given a two-dimensional dataset as follows, please show the dendrogram(樹狀圖) of the complete-link clustering result. The complete-link distance between two groups Gi and Gj:   d  Gi , G j   r maxs d  x r , x s  where d x r ,x s   j 1 x rj  x sj x Gi , x G j d 12. (3%) Please estimate the density function with h = 0.5 by histogram estimator. pˆ  x   #  xt in the same bin as x Nh 3 第 4 頁共 7 頁 13. (3%) In nonparametric regression, given a running mean smoother as follows, please finish the graph with h = 3.  bx, x  r gˆ x    bx, x  N t t 1 N t 1 t t 1 if xt is in the same bin with x where b x, x   0 otherwise  t  14. (6%) Given a regression tree as follows. (1) Please draw its corresponding regression result. (2) Could you show one rule which is extracted from this regression tree? 4 第 5 頁共 7 頁 15. (2%) What is a multivariate tree? 16. (4%) In pairwise separation example as follows, and Hij indicates the hyperplane separate the examples of Ci and the examples of Cj. Please decide each region belongs to which class. gij  x|w ij , wij 0   wTij x  wij 0 if x  Ci  0  gij  x     0 if x  C j don't care otherwise  choose Ci if j  i , gij  x   0 17. (4%) Given a Classification tree construction algorithm as follows. K where I   pi log pi  m 2 m m i 1 i i (eq. 9.3) and I'm   N mj  pmj (eq. 9.8) log 2 pmj n j 1 K Nm i 1 Can you explain what the function “SplitAttribute” does? 5 第 6 頁共 7 頁三、計算證明題 1. (10%) Given a sample of two classes, X  x t , r t t , where r t  1 if x  C1 and r t  0 if x  C2 . In logistic discrimination, assume that the log likelihood ratio is linear in two classes case, the estimator of PC1 | x is the sigmoid function 1 y  PC1 | x   1  exp  w T x  w0   t  t We assume r , given x , is Bernoulli distribution. Then the sample likelihood is r  1 r  , and the cross-entropy is l w , w0 | X     y t  1  y t  t t t E w, w0 | X    (r t log y t  (1  r t )log( 1  y t ) ) t Please find the update equations of w j and w0 , where w j   E E ,and w0   , j  1,..., d . w0 w j 6 第 7 頁共 7 頁 2. (10%) Using principal components analysis, we can find a low-dimensional space such that when x is projected there, information loss is minimized. Let the projection of x on the direction of w is z = wTx. The PCA will find w such that Var(z) is maximized Var(z) = wT ∑ w where Var(x)= E[(x – μ)(x –μ)T] = ∑ If z1 = w1Tx with Cov(x) = ∑ then Var(z1) = w1T ∑ w1, and maximize Var(z1) subject to ||w1||=1. please show that the principal component is the eignvector of the covariance matrix of the input sample with the largest eigenvalue. 7

第頁共7頁機器學習期末考試學號：姓名： 102/6 是非題(24%) ( ) 1

Related documents

Products

Support

第頁共7頁 機器學習 期末考試 學號： 姓名： 102/6 是非題(24%) ( ) 1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

第頁共7頁機器學習期末考試學號：姓名： 102/6 是非題(24%) ( ) 1