home2-14

advertisement
Dr. Eick
COSC 6342“Machine Learning” Homework2&3 Spring 2014
Last updated: April 2, 9a
Deadline: Tu., April 15, 11p
11) K-Means and EM (Ungraded)
a) What is the meaning of b_it and h_it in the minimization process of K-mean’s/EM’s
objective function? What constraint holds with respect to h_it and b_it? How is b_it
computed by EM?
Remark: b_it means b t h_it meansh
i
b) Why do K-means and EM employ an iterative optimization procedure rather than
differentiating the objective function and setting its gradient to 0 to derive the optimal
clustering? What is the disadvantage of using an iterative optimization procedure?
c) EM is called “a soft clustering algorithm”—what does this mean?
d) Summarize in natural language what computations EM performs during its M-step!
t
i
12) Comparing Results of K-Means and EM
Apply k-means and EM with k=5 to the Iris Flower Dataset; run each algorithm twice; in
the case EM additionally explore the impact of alternative input parameters; compare the
results; assess the differences in results of there are any. Based on this experiment, assess
the strength / weaknesses of each algorithm.
13) Non-Parametric Density Estimation1 (Ungraded)
Assume we have a one dimensional dataset containing values {2, 3, 7, 8, 9, 12}
i.
Assume h=2 for all questions (formula 8.2); compute p(x) using equation
8.2 for x=6.5 and x=10
ii. Now compute the same densities using Silverman’s naïve estimator
(formula 8.4)!
iii. Now assume we use a Gaussian Kernel Estimator (equation 8.7); give a
verbal description and a formula how this estimator measures the density
for x=10
iv.
Compare the 3 density estimation approaches; what are the main
differences and advantages for each approach?
14) Non-parametric Density Estimation2
a) Assume a dataset X={xt,rt}consisting of 4 examples (0,1), (1,3), (2,7), (4,1) is given
and the bin-width is 2.5: assume that x and x’ belong to the same bin if |x-x’|2.5.
a1) Compute the values (also give the formula) for the regressogram for inputs 0.5, 1.8,
and 4.4 for the mean smoother (see formula 8.19 on page 175 of the textbook).
ĝ(0.5)=
ĝ(1.8)=
ĝ(4.4)=
Now assume the bin-width is only 1. Recompute the prediction for input 1.8!
ĝ(1.8)=
1
a2) In general the function obtained using the above approach has discontinuities. What
could be done to obtain a continuous function?
b) What is the main difference between the Gaussian Kernel Density function approach
as described in Section 8.2.2 of the textbook and the k-nearest Neighbor Density
Estimator that has been described in Section 8.2.3?
c) What advantages you see in using a non-parametric density estimation approach
compared to parametric density approaches, such as using multivariate Gaussians?
15) Computations in Belief Networks /D-separation [11]
Assume that the following Belief Network is given that consists of nodes A, B, C, D, and
E that can take values of true and false.
B
A
E
/
C
D
/
a) Using the given probabilities of the probability tables of the above belief network
(D|C,E; C|A,B; A; B; E) give a formula to compute P(D|A). Justify all nontrivial steps
you used to obtain the formula!
b) Using the given probabilities of the probability tables of the above belief network
(D|C,E; C|A,B; A; B; E) give a formula to compute P(E|A,B). Justify all nontrivial steps
you used to obtain the formula!
c) Are C and E independent; is C| and E| d-separable? Give a reason for your answer!
 denotes “no evidence given
d) Is E|CD d-separable from A|CD? Give a reason for your answer!
2
16) Using Hidden Markov Model Tools
Assume the following Hidden Markov Model (HMM) is given:
a) What is the probability of the following 3 DNA sequences?
i.
CTCTGTTTT
ii. CGGGGAGTT
iii. CACTCTCGG
b) What is the most likely state path for each of the above 3 sequences?
Interpret the answers you obtained—do they make sense?
Remark: using any HMM tool to obtain an answers to these questions if fine!
17) Support Vector Machines
a) Why do most support vector machine approaches usually map examples to a higher
dimensional space?
b) The support vector regression approach minimizes the following objective function,
given below. Give a verbal description what this objective function minimizes! What
purpose does  serve? What purpose does C serve?


1
2
min w  C   t   t subject to:
2
t
r t  w T x  w0     t
w

T


x  w0  r t     t
 t ,  t  0
for t=1,..,n
c) Assume you apply support vector regression to a particular problem and for the
obtained hyper plane  t and  t are all 0 for the n training examples (t=1,..,n); what does
this mean?
3
Download