eie520_lab2V6 - Princeton University

advertisement
THE HONG KONG POLYTECHNIC UNIVERSITY
Department of Electronic & Information Engineering
The Hong Kong Polytechnic University
EIE520 Neural Computation
Lab 2: 2-D Vowel Recognition using Kernel-Based Neural Networks
A. Introduction:
There are many methods to perform pattern classification using neural networks.
Among these methods, Gaussian Mixture Models (GMMs) and Radial Basis Function
(RBF) networks are two of the most promising neural models for. In this laboratory
exercise, your task is to develop pattern classification systems based on GMMs and
RBF networks. The systems should be able to recognize 10 vowels. You will use
“netlab” [5] together with Matlab to create the GMMs and RBF networks.
B. Objectives:
You should complete the following tasks at the end of this laboratory exercise:
1. Create GMMs and RBF networks to represent 10 different classes.
2. Perform pattern classification using the created networks.
3. Compare the GMM-based system against the RBF network-based system using
different in terms of recognition accuracy, decision boundaries, training time, and
recognition time by varying the numbers of kernels.
4. Find the decision boundaries and plot them on a 2-D plan.
C. Background:
C.1 GMM-Based Classifier
mwmak03/eie520/lab2
17/02/2016
1
K
k  arg max p( x t | i )
i 1
MAXNET
p(x t | 1 )
p ( x t | K )

P(1|K |  K )
P ( R|1 | 1 )
P(1|1 | 1 )
p ( x t | 1 , 1|1 )
….
….
p ( x t | 1 ,  r|1 )
P ( R| K |  K )
P ( r | K |  K )
P( r|1 | 1 )
….

p( x t | K ,  r|K )
p( x t | 1 ,  R|1 )
p(x t | K ,  R|K )
p( x t | K , 1|K )
x t (D-dimensional vectors)
Figure 1. Architecture of a GMM-based classifier
Figure 1 depicts the architecture of a K-class classifier in which each class is
represented by a Gaussian mixture model (GMM). GMMs make use of
semi-parametric techniques for approximating probability density functions (pdf).
The output of a GMM is the weighted sum of R component densities, as shown in
Figure 1. Given a set of N independent and identically distributed patterns
X (i )  xt ; t  1,2,..., N  associated with class i , we assume that the class
likelihood function p(x t | i ) is a mixture of Gaussian distributions, i.e.,
R
p( x t | i )   P(  r|i | i ) p( x t | i ,  r|i )
(1)
r 1
where  r|i represents the parameters of the r-th mixture component, R is the total
number of mixture components, p( x t | i ,  r|i )   ( r|i ,  r|i ) is the probability
density function of the r-th component and P (  r|i | i ) is the prior probability (also
mwmak03/eie520/lab2
17/02/2016
2
called mixture coefficients) of the r-th component. Typically,  (  r|i ,  r|i ) is a
Gaussian distribution with mean vectors r|i r 1 , covariance matrices  r|i r 1 , and
R
R
mixture coefficients P(r|i | i )r 1 are typically estimated by the EM algorithm.
More specially, the parameters of a GMM are estimated iteratively by
R
N

( j 1)
r|i

P
( j)
t 1
N
P
(  r|i | x t )x t
,
( j)
t 1
N
 (r|ij 1) 
P
( j)
t 1
(  r|i | x t )
(  r|i | x t )[ x t   (r|ij 1) ][ x t   (r|ij 1) ]T
, and
N
P
( j)
t 1
N
P
( j 1)
(  r|i ) 
P
( j)
t 1
(2)
(  r|i | x t )
(  r|i | x t )
r  1,  , R
N
where j denotes the iteration index, P ( j ) (r|i | x t ) is the posterior probability of the
r-th mixture ( r  1,  , R ) , and T denotes matrix transpose. The posterior
probability can be obtained by Bayes’ theorem, yielding
P ( j ) ( r|i | x t ) 
P ( j ) ( r|i ) p ( j ) ( x t | r|i )
R
P
( j)
k 1
(3)
( k |i ) p ( x t | k |i )
( j)
in which
p ( j ) ( x t |  r|i ) 
1
D
2
( 2 ) | 
1
( j) 2
r|i
|




T
 1

exp  x t   (r|ij ) ( (r|ij ) ) 1 x t   (r|ij ) 
 2

(4)
where D is the input dimension.
C.2 RBF Network-Based Classifier
In multi-layer perceptrons, the hidden neurons are based on linear basis function
(LBF) nodes. Another type of hidden neurons is the radial basis function (RBF)
neurons, which is the building block of the RBF neural networks. In an RBF
network, each neuron in the hidden layer is composed of a radial basis function that
also serves as an activation function. The weighting parameters in an RBF network
are the centres and the widths of these neurons. The output functions are the linear
combination of these radial basis functions. A more general form of the RBF
networks is the elliptical basis function (EBF) networks where the hidden neurons
mwmak03/eie520/lab2
17/02/2016
3
compute the Mahalanobis distance between the centres and the input vectors. It has
been shown that RBF networks have the same asymptotic approximation power as
multi-layer perceptrons.
y2 (x(t ))
y1 (x(t ))
10

1
11
1M
…….

K1
K 2
12
K 0
1
KM
Group K
Group 1
1 (x(t ))
 2 (x(t ))
…..
 M 1 (x(t ))
 M (x(t ))
x(t )(D-dimensional vectors)
Figure 2. Architecture of a K-output EBF network
To apply RBF/EBF networks for pattern classification, each class is assigned a
group of hidden units, and each group is trained independently using the data from
the corresponding class. Figure 2 depicts the architecture of an RBF/EBF network
with D inputs, M basis functions (hidden nodes), and K outputs. The input layer
distributes the D-dimensional input patterns, xt, to the hidden layer. Each hidden
unit is a Gaussian basis function of the form
 1

j  1,..., M
 j ( x t )  exp 
( x t   j ) T  j1 ( x t   j ) 
(5)
 2  j

where  j and  j are the mean vector and covariance matrix of the j-th basis
function respectively, and  j is a smoothing parameter controlling the spread of the
j-th basis function. The k-th output is a linear weighted sum of the basis functions’
output, i.e.,
mwmak03/eie520/lab2
17/02/2016
4
M
y k (x t )  k 0   kj  j (x t )
t  1, , N and k  1,, K
(6)
j 1
where x t is the t-th input vector and  k 0 is a bias term.
In matrix form, (6) can be written as   W where  is an N  K matrix,  an
N  ( M  1) matrix, and W is an ( M  1)  K matrix. The weight matrix W is the
least squares solution of the matrix equation
W  D
(7)
where D is an N  K target matrix containing the desired output vectors in the rows.
As  is not a square matrix, one reliable way to solve (7) is to use the technique of
singular value decomposition. In this approach, the matrix  is decomposed into
the product UΛ VT, where U is an N  ( M  1) column-orthogonal matrix,  is an
( M  1)  ( M  1) diagonal matrix containing the singular values, and V is an
( M  1)  ( M  1) orthogonal matrix. The weight vectors w k k 1 are given by
K
w k  V1 U T d k
(8)
where d k is the k-th column of D. For an over-determined system, singular value
decomposition gives a solution that is the best approximation in the least squares
sense.
D. Procedures:
D. 1 GMM-Based Classifier
1) Download the “netlab” software from http://www.ncrg.aston.ac.uk/netlab/ and save the
m-files in your working directory. Download the training data and testing data from
M.W. Mak’s home page http://www.eie.polyu.edu.hk/~mwmak/SA/2DVowel.zip.
This is the vowel data you will use in this laboratory exercise.
2) Open Matlab, go to “File” -> “Set Path” and add the directory where “netlab” was
saved.
3) Import and save the training data, 2DVowel_train.pat, to a 2D array. The imported
matrix should be 33812 in size. The first 2 columns contain training feature vectors in
a 2-D input space and the 3rd to 12th columns indicate the class to which each pattern
belongs.
4) After importing the data, you create and initialise the GMMs by using the function gmm
and gmminit respectively. Set the number of centers to 3 and covariance type to
“Diagonal” first. The model can be created by using
model_name=gmm(data dimension, no of centres, covariance type)
and the model can be initialized by
mwmak03/eie520/lab2
17/02/2016
5
model_name=gmminit(model_name, data, options)
5) Then, you use the EM algorithm gmmem to train the models. An example program,
creation_sample.m is provided to demonstrate this training process.
6) Create 10 GMMs to represent the 10 vowels using the data from 10 different classes. It
is recommended to separate the data into 10 files.
7) Plot the imported training data together with the centres after EM training.
8) Now, import the testing data, 2DVowel_test.pat. This file is for you to test the
classification rate of the GMMs you have just created. The file contains 333 data points,
and again each point belongs to one of the 10 classes. For a given data point, the
probability that it belongs to a particular class can be calculated by the function
gmmprob. Each data point is classified to the class whose corresponding likelihood is
the highest. The overall classification rate is calculated by:
Number of correctly classified points
Total number of data points
100%
9) Now, try different numbers of centres and different covariance types (Diagonal and
Full) when creating the models. Find the optimal combination that gives the highest
classification rate. What is the optimal combination and what is the classification rate?
10) Suppose you have already obtained the optimal number of centres and covariance type
for the models. Now you should start with finding the decision boundaries. The
decision boundaries separate the 10 classes in a 2-D space. You can do this by finding
the class difference in the x-y plan and a sample program,
decision_boundary_sample.m, is included in this exercise to demonstrate the
procedure of finding the boundaries. What happen at the edge of the x-y plan? Explain
the phenomenon.
D.2 RBF network-Based Classifier
1. In this part, you will repeat D.1 but using RBF networks. Again, you should start with
importing the training data to a 33812 array.
2. After imported the data, you should separate it into 2 parts: one is the data part, which is
3382 in size, and the other one is the class ID, which is 33810 in size.
3. Instead of creating 10 different RBF networks as Part I, you should create one RBF
network. To create an RBF network, you use the function rbf. In order to specify the
network architecture, you must provide the number of inputs, the number of hidden
units, and the number of output units.
mwmak03/eie520/lab2
17/02/2016
6
4. After that, you initialise the RBF network by calling the function rbfsetbf. You
need to specify a number of option fields as in gmm in Part D.1. Before performing
classification, call the function rbftrain to train the RBF network you have just
created. You have to specify a target vector, which contain the classing information. At
this stage, all the training processes should have finished and the next step is to do
classification.
5. Now, import the testing data and use the function rbffwd to perform classification.
This function has 2 input fields, one is the RBF network that will be used for
classification and the other one is a row vector. In this exercise, this row vector should
have 2 fields: the x location and y location. The output is again a row vector, the size
will be equal to the number of outputs that you specify in Step 3. For each test vector,
the class ID is determined by selecting the netlab output whose response to the test
vector is the highest.
6. Compute the classification rate of the whole testing set data. Try different number of
hidden units and select the optimal one. What is the optimal number of hidden units and
what is the corresponding classification rate? Compare and explain the classification
performance of the RBF networks with that of the GMMs.
7. After successfully select an optimal number of hidden units, you can plot the decision
boundaries. Again, it can be done by finding the class difference in the x-y plan. Log
down your result and compare the whole process with that of the GMMs in terms of
training time and recognition processing time.
E. References:
1. http://www.ncrg.aston.ac.uk/netlab/
2. C. M. Bishop, Neural Networks for Pattern Recognition, Oxford Press, 1995.
3. K.K. Yiu, M.W. Mak and C.K. Li, "Gaussian Mixture Models and Probabilistic
Decision-Based Neural Networks for Pattern Classification: A Comparative Study,"
Neural Computing and Applications, Vol. 8, pp. 235-245, 1999.
4. M.W. Mak and S.Y. Kung. "Estimation of Elliptical Basis Function Parameters by the
EM Algorithms with Application to Speaker Verification," IEEE Trans. on Neural
Networks, Vol. 11, No. 4, pp. 961-969, July 2000.
5. Ian T. Nabney, Netlab Algorithms for Pattern Recognition, Springer, 2002.
mwmak03/eie520/lab2
17/02/2016
7
Download