Homework 4

advertisement
Homework 4
CSE 802, Spring 2012
Due: Feb 27, 2012
1.
(a) Compute the 2 x 2 scatter matrix S for this data
T
S   x j  x x j  x  , where x j is the j-th sample (row) and x is the mean vector that is
n
j 1
calculated by using all the samples. Under these defiintions:
x  5.7100 3.1600
 7.329  0.706
S

 0.706 1.624 
(b) S  I 
7.329  
 0.706
 0.706
1.624  
0
11.9023  2  8.9530  0.4984  0  1  7.4151,
2  1.5379
(c) Sei  i ei such that e i  1
r 
 r   0.9927
 7.329  0.706  r1 
Se1  1e1  
 7.4151 1   e1   1   




 0.706 1.624  r2 
r2 
r2   0.1210 
r 
r   0.1210
 7.329  0.706 r3 
Se2  2e2  
 1.5379 3   e1   3   




 0.706 1.624  r4 
r4 
r4   0.9927
(d) Blue lines are eigenvectors, red points are mean subtracted data points.
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-1.5
-1
-0.5
0
0.5
1
1.5
2. (a) Since each pattern class has a multivariate Gaussian density with unknown mean vector and
a common but unknown covariance matrix, we will estimate the mean for each class as the mean
of the training patterns for that class.
The way to compute the common covariance matrix is to first compute the covariance matrix of
the three different classes and then average them as the common covariance matrix. The
performance of the plug-in classifier for this data gives the average error rate for 15 rounds cross
validation: 0.0248 and variance of error rate: 0.00051995. The result of individual run is listed in
Table 1.
(b) Since each of the classes has different class mean and covariance matrix, we separate data of
different classes and compute the parameters for each of the pattern class. Also, because their
covariance matrices are different, the resulting decision boundary is quadratic. The average error
rate for these 15 rounds cross validation is: 0.0353. The variance of error rate is: 0.00061515. The
result of individual run is listed in Table 1.
(c) For parzon window classifier:
Window size = 0.01,
Average error rate = 0.6549.
Variance of error rate = 0.0001.
Window size = 0.5,
Average error rate is 0.0549.
Variance of error rate is 0.0006.
Window size = 10.0,
Average error rate is 0.4902.
Variance of error rate is 0.0010.
(d) Comparing the results in (a) and (b), we find that the quadratic classifier performs in a little
bit worse than the linear classifier for the IRIS data. So the complicated decision boundary does
not necessarily leads to good performance. One of the reasons might be that the linear classifier
uses more data to compute the covariance matrix and so the resulting variance of the prediction
error is smaller.
Comparing the result of the Paren classifier on different window sizes, we can see that the error
rate is greatly influenced by the window width. A width of 0.5 is beter compared with 0.01 and
10.
Comparing the result of (c) with (a) and (b), we see that the result using 0.5 as the window size
actually achieves comparable performance with the method in (a) and (b). The error rates are all
in the range of 2%-4%.
The following table lists running time for the three methods, which includes, load data time, and
computation time for 10 rounds of cross validation:
We see from the above table that the running time gets longer and longer as the computation of
the classifer increeas.
While the selection of the appropriate window size takes some effort, the advantage of Parzen
window classifier over other methods is that the non-parmetric method does not need to assume
the underlying distribution is Gaussian distribution.
3.
(a) 2D plot of IRIS data using PCA is shown in Figure 1. The corresponding plot using MDA is
shown in Figure 2. MDA provides better separation than PCA, since MDA utilizes the class
category information to project the data.
(b) Percentage variance is the variance retained by the eigenvalues in PCA projection. The value
for the 2D projection for IRIS data is 0.9776.
(c) We assume the following three categories of the IRIS data:
1: setosa.
2: versicolor.
3: virginica.
The discriminant function for class 1 is:
g1 = -22.* x12 + 24.* x1 * x2 -15.* x22 +14.* x1+ 22.* x2 - 27
The discriminant function for class 2 is:
g2 = -8.2.* x12 + 4.8* x1 * x2 -9.8.* x22 +8.4.* x1+ 27.* x2 - 24
The discriminant function for class 3 is:
G3 = -7.1.* x12 + 2.8* x1 * x2 -5.2* x22 +23.* x1+ 15.* x2 - 35
The error rate for the classifiers is 0.0267. Note that when you compute the error rate, you need to
compute the value of the three discriminant functions simultaneously. Compared with problem 1
b), this error rate is smaller. This is because the 2D projection using MDA projects the data into a
linear sub-space, which separates the true categories as well as possible.
Download