Uploaded by 紫薇清竹

机器学习大作业题目

advertisement
1. Among the supervised-learning algorithms studied thus far, the support vector machine
stands out as the most powerful. In this problem, the performance of the support vector
machine is to be challenged by using it to classify the two multicircular regions that constitute
the “tightly fisted” structure shown in Fig.1.The radii of the three concentric circles in this
figure are 𝑑1 = 0.2, 𝑑2 = 0.5, 𝑑3 = 0.8.
(a) Generate 100 epochs, each of which consists of 200 randomly distributed training
examples, and an equal number of test data for the two regions of Fig. 1.
(b) Train a support vector machine, assigning the value 𝐶 = 500. Hence, construct the
decision boundary computed by the machine.
(c) Test the network and thereby determine the classification error rate.
(d) Repeat the experiment for 𝐶 = 100 and 𝐶 = 2500. Comment on your results.
Fig 1
2. The purpose of this computer experiment is to investigate the clustering process performed
by the K-means algorithm. To provide insight into the experiment, we fix the number of
clusters 𝐾 = 6, but vary the vertical separation between the two moons in Fig. 2. Specifically,
the requirement is to do the following, using an unlabeled training sample of 1,000 data
points picked randomly from the two regions of the double-moon pictured in Fig. 2:
(a) Experimentally, determine the mean 𝜇𝑗 and variance 𝜎𝑗2 , j = 1,2 ⋯ 6, for the sequences
of eight uniformly spaced vertical separations starting at 𝑑 = 1 and reducing them by one
till the final separation 𝑑 = 6 is reached.
(b) In light of the results obtained in part (a), comment on how the mean 𝜇𝑗 of cluster 𝑗 ∈
{1,2,3} is affected by reducing the separation 𝑑.
(c) Plot the variance 𝜎𝑗2 versus the separation 𝑑 for 𝑗 = 1, 2, . . . , 6.
(d) Compare the common σ2 computed in accordance with the empirical formula σ =
dmax
√2𝐾
with the trends exhibited in the plots obtained in part (c), where dmax is the maximum
distance between 𝐾 means.
Fig 2
Download